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I.  Introduction 


The  problem  of  optimum  detection  of  signals  in  stochastic  noise  has 
been  solved  only  in  a  relatively  few  cases.  Most  investigators  assume  the 
signal,  or  the  noise,  are  Gaussian;  however  in  many  very  important  practical 
situations  (e.g.,  radar,  sonar  or  satellite  transmission),  this  assumption 
does  not  hold  (cf  [10],  [12],  [25],  L26]). 

Luganr.ani  and  Thomas  (1967)  developed  the  class  of  linear  processes, 
as  a  potential  model  for  noise,  and  showed  that  the  class  was  closed  under 
linear  transformations,  a  desirable  property  for  modeling  purposes.  For  a 
very  specialized  type  of  linear  process,  Eastwood  .and  Lugannani  (1977) 
were  able  to  construct  an  approximation  to  the  n-dimensional  densities  of 
two  linear  processes  evaluated  at  (t^,  t2,...,  t^) .  Consequently  they  were 
able  to  obtain  a  likelihood  ratio  test  approximation  for  this  special  class 
of  processes.  It  is  of  some  considerable  importance  that  the  Middleton 
model  for  acoustical  reverberation  [18],  [19],  [20],  [21]  (which  has  been 
called  the  most  complete  theoretical  model  for  reverberation  [8])  is  a 
linear  process,  as  are  several  other  models  for  noise  derived  from  purely 
physical  reasoning.  Indeed,  it  was  the  physics  of  the  acoustical  noise 
process,  and  the  goal  of  obtaining  optimal  signal  processing  which  led  to 
this  problem. 

In  t! is  paper  we  shall  show  how  to  identify  a  linear  process  as  a 
subclass  of  infinitely  divisible  processes  (i.e.,  processes  with  infinitely 
divisible  finite  dimensional  marginal  distributions).  Using  the  results 
of  Maruyama  (1970),  Brown  (1971),  Briggs  (1975),  Skorokhod  (1964)  and 
Veeh  (1981),  we  are  then  able  to  explicitly  calculate  the  Radon-Ni kodym 
derivative  of  the  measures  determined  on  function  space  by  two  infinitely 


divisible  processes  and  to  find  its  distribution.  This  extends  the  results 
of  several  authors  and  enables  us  to  use  the  Neyman- Pearson  lemma  to  obtain 
an  optimal  detector  applicable  directly  to  the  sample  paths  of  many  stochastic 
process  models  derived  in  physics  for  noise  processes. 
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II.  Linear  Processes 

In  this  section  we  shall  present  a  brief  description  of  the  linear 
process.  A  more  extensive  discussion  and  references  are  given  in  Lugannai 
and  Thomas  (1967) . 

The  linear  process  Y ( t '  is  defined  by  the  stochastic  integral 
b 

(1)  Y(t)  =  /  f(t,s)dX(s) 

a 

where  X(s)  is  a  zero-mean,  second  order  stochastically  continuous  process 

with  independent  increments.  Additionally,  f(t,s)  is  real  valued  and  square 

o 

integrable  with  respect  to  dV(s)  =  E|dX(s)|  .  In  brief,  Y  is  an  L2  filtering 
of  an  independent  increment  process.  We  shall  additionally  assume  f  is 
L2  continuous  (so  that  Y(t)  is  stochastically  continuous).  As  particular 
cases  we  have  the  Gaussian  process  and  the  shot  noise  process. 

following  the  method  used  in  Papoulis  (1965)  for  shot  noise  process, 
one  may  determine  the  finite  dimensional  characteristic  functions  of  the 
linear  process  (1)  (cf  Lugannani  and  Thomas  (1967)  or  Eastwood  and 
lugannani  (1977)).  In  the  case  without  Gaussian  component,  they  are  given  by 

b  °° 

(2)  0.  (u)  =  exp{/  /{exp(izw)  -  1  -  izw}M(ds,dz) 

~  ~  a  0 

where  t  =  (tj ,. .  .tn),  u  =  (u^ ...  .u  )  and  w  =  u-|f(tj,s)  +  u2f(t2,s)+. . . 

+  u„f(t  ,s).  The  measure  M  is  the  time-jump  measure  of  the  additive  process 
X,  i.e.,  M((s-j,s2]  x  A)  is  the  expected  number  of  jumps  (pulses)  of  the 
process  X  during  the  time  interval  (s-j , s2 ]  with  the  magnitude  (amplitude) 
in  the  Borel  set  A.  See  Gikhman  and  Skorokhod  (1969)  for  a  more  detailed 
explanation  of  the  Levy  measure  M  and  its  properties.  We  have  deleted  the 
Gaussian  component  of  Y(t)  from  (2)  because  it  is  independent  of  the 
non-Gaussian  part,  and  it  has  already  been  treated  extensively  in  the  literature. 
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In  the  next  section  we  shall  describe  the  infinitely  divisible  processes 
and  show  that  the  linear  process  is,  in  fact,  an  infinitely  divisible  process 
of  a  very  special  type. 


III.  Infinitely  Divisible  Processes 


The  class  of  infinitely  divisible  stochastic  processes  was  evidently 
first  studied  by  Lee  (1967),  and  subsequently  studied  by  Maruyama  (1970), 

Briggs  (1975)  and  Veeh  (1981).  A  stochastic  process  is  called  infinitely 
divisible  if  all  of  its  n-dimensional  marginal  distributions  are  n-dimensional 
infinitely  divisible  random  vectors.  Gaussian  processes  are,  of  course, 
infinitely  divisible  and, for  those  second  order  processes  without  a  Gaussian 
component,  the  following  representation  holds.  By  definition,  for  every 
finite  subcollection  X  =  (t-j ,  t2>...,  t  }  £;[a,b],  there  exists  a  random 
vector  c,,  and  an  n-dimensional  Levy  measure  M.  such  that  the  characteristic 

A  A 

function  of  ^Y(t-| ) , . . .  Y(tn)j  is 

(3)  ln$A(u)  =  iu'cA  +  /{exp(iu'x)  -  1  -  iu'x}dMA(x) 

Here  we  have  used  a  variant  of  the  Kolmogorov  representation  valid  for  second 
order  infinitely  divisible  vectors  (cf  Lukacs  (1970)  p.  119). 

Let  A  =  {X  =  {t-|,...t  }}  denote  the  set  of  all  finite  subsets  of  [a ,b] . 

The  collection  ((cA,  MA),  A  e  A]  uniquely  determines  the  distribution  of  an 
infinitely  divisible  process  Y,  and  vice  versa  (Maruyama  (1970)  Theorems 
1  and  3).  Using  the  partial  ordering  on  A  by  inclusion,  we  obtain  a  system 
of  projections  {PA  X  e  A}  from  function  space  IR^-3’^  onto  the  coordinate 
space  IR  \  The  system  of  Levy  measures  (MA>  X  c  A)  is  consistent,  and 
Maruyama  shows  that  a  measure  Q  may  be  defined  on  lR^3,t^  as  the  projective 
limit  of  the  collection  {MA,  X  e  A}.  The  a-algebra  on  IR^3’*^  is  the 
usual  product  o-algebra,  and  the  construction  of  Q  is  similar  to  the  usual 
construction  in  Kolmogorov's  existence  theorem  for  processes  with  given  marginals. 
Thus,  corresponding  to  an  infinitely  divisible  process  there  is  a 
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function  c(t),  and  a  measure  Q  living  on  function  space  such  that  for 
A  =  {tr...tnh  PAc  =  (c(t1),...c(tn))  =  cA  and  QP^V)  =  (A)  are  the 

parameters  of  the  infinitely  divisible  random  vector  (Y (t-j ) , . . . Y ( tn ) ) .  We 
shall  see  that  the  projective  limit  Q  is  actually  explicitly  calculatable 
in  the  linear  process  case. 

To  obtain  the  appropriate  representation  of  a  linear  process  as  an 
infinitely  divisible  process,  we  first  manipulate  the  characteristic  function 
given  by  (2)  into  the  form  of  (3).  Towards  this  end  we  first  note  that 
if  A  =  lt,,...,t  }  is  given,  and  M.  is  defined  on  IK  ^  via  M.(A) 

=  M({(s,z):  (zf(t,,s),...,zf(tn,s))  c  A}).  Then  is  a  Levy  measure  on 
IR  A  concentrated  on  the  curve  (zf(t, ,s) ,. . . ,zf (tp,s) ) ,  s  e  [a,b],  z  e  IR  . 

Moreover,  the  integral  relationship 

b  °° 

/h(x)dM  (x)  =  /  /  h(zf(t1,s),...,zf(tn,s))M(dz,ds)  holds  for  measurable  h. 

a  -c°  ^ 

The  fact  that  M  is  indeed  a  Levy  measure  on  IR  follows  (after  some 

A 

calculations)  from  the  square  integrabil i ty  of  f  with  respect  to  V  and 
from  the  formula  /h(s)dV(s)  =  //z2h(s)M(ds,dz)  which  relates  the  variance 
measure  to  the  time-jump  neasure  M. 

Now  let  us  write  f  (s)  =  P^f(  *  ,s)  =  (f(t-j  ,s) ,. . .  ,f(tn»s))  and  cA  -  0. 

We  observe  that 

oo 

iu'c^  +  /{exp(iu'x)  -  1  -  iu'x}  Mx(dx) 

"  ""  oo 

=  //{exp(izu'fx(s))  -  1  -  izu'fA(s)}M(ds,dz) 

=  //{exp(izw)  -  1  -  izw.>M(ds  ,dz ) 

where  w  =  u^t^s)  +  u2f(t2,s)  +...+  unf(tn,s)  as  before.  Thus  (2)  is 
of  the  form  (3).  It  follows  that  linear  processes  are  in  fact  infinitely 
divisible  processes.  Moreover,  we  can  determine  the  projective  limit  Q  of 
the  system  of  Levy  measures  (M^,  A  c  A}.  Namely,  if  A  c  IR  ^  ^  is  a 

collection  of  functions,  then  Q(A)  =  M({(s,z):  zf(-,s)  c  A}).  This  follows 
since  if  B  c  IR  \  then  MX(B)  =  M({(s,z):  (zf(t]  ,s) , . . . zf (tp ,s) )  eB)) 
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=  M({(s,z):  zP  f(*,s)  e  B})  =  Q( P"1 (B) ) ,  so  that  the  Xth  coordinate  projection 

A  A 

of  Q  is  M  . 

A 

In  the  next  section  we  show  how  to  use  the  results  in  Skorokhod  (1964), 
Briggs  (1975),  Brockett,  Hudson  and  Tucker  (1978)  and  Brown  (1978) 
to  explicitly  calculate  the  Radon-Ni kodym  derivation  (likelihood  ratio)  of 
two  linear  processes  on  the  function  space  IR^a’^.  Note  that  since  Y(t) 
is  assumed  to  be  stochastically  continuous,  the  projective  Levy  measure  Q 
on  ]R^a’^  is  o-finite  by  Proposition  3.1  in  Maruyuma  (1970).  Along  the 
way  we  extend  the  existing  theorem  on  likelihood  ratios  for  infinitely 
divisible  processes. 


IV.  The  Likelihood  Ratio 


We  begin  by  sketching  the  construction  of  an  infinitely  divisible 
process  Y  as  a  limit  of  integrals  of  Poisson  random  measures.  For  details 
see  Gikhman  and  Skorokhod  (1969)  or  Maruyuma  (1970). 

Let  If  be  a  Poisson  random  measure  on  IR^a’^  which  has  the  corresponding 
intensity  measure  Q,  i.e.,  for  any  set  ScR^a’^  with  Q(A)  <  dT^A)  is 
a  Poisson  random  variable  with  expectation  Q(A).  Moreover,  if  A, ,...,An 
are  disjoint  sets,  then  ir(A. ) , . . .  ,*rr(A  )  are  independent  random  variables. 

See  Kallenburg  (1976)  for  details. 

The  random  measure  ='ir(A)  -  Q(A)  is  used  to  give  a  pathwise 

representation  of  the  second  order  linear  process  Y,  namely  we  may  write 

(4)  Y(t)  =  lim  in  Prob.  /  x  f  t)rr*(dx )  where  A  =  { x :  [ x ( t )  _>  c) 

cfO  A  E 

c 

io  see  this,  note  that  Y,  =  ( Y ( t ^ ),..., Y(t  )  has  a  characteristic  function 

n 

given  by  log  y,  (u)  =  log  E(exp  i  Z  u.Y(t.)) 

X  ~  j=l  J  J 

n  n 

=  /{exp( i  z  u  .  X  (t . ) )  -  1  -  i  Z  u.X(t  )}Q(dx) 
j=l  J  J  j=l  J  J 

=  /{exp  (iu'P  x)  -  1  -  iu'P  x)Q(dx) 

_  A  A 

=  /{exp  (iu'y)  -  1  -  iu'y}Mx(dy). 

Thus,  the  finite  dimensional  distributions  given  by  the  right  hand  side  of 
(4)  agree  with  those  of  the  linear  process. 

We  are  now  ready  to  calculate  the  likelihood  ratio  of  two  infinitely 
divisible  processes  without  trend  functions.  The  multivariate  Levy  measures 
M-j  and  induce  (via  projective  limits)  the  measures  and  on  function 
space  as  described  earlier.  The  processes  Y.  i  =1,2  determine  measures 
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on 

"1 


function  space  via  u^(A)  =  P  [  Y  ^.  ( * )  t 
<<  \-2,  and  the  corresponding  density 


A], 


and  we  wish  to  determine  when 
(x),  x  f.  IR^a,b^. 


The  following  theorem  generalizes  the  results  of  Briggs  (1975)  to  include 
general  infinitely  divisible  processes  (not  just  those  with  non-atomic 
projective  measures  Q).  As  mentioned  by  Veeh  (1981)  it  is  not  clear  just 
how  restrictive  Briggs'  assumptions  are.  Certainly  for  some  sonar  problems 
they  are  too  restrictive.  Additionally  it  generalizes  the  results  of 
Veeh  (1981)  and  Brockett,  Hudson,  and  Tucker  (1978).  Moreover,  by  using 
the  Maruyuma  representation  (4)  for  infinitely  divisible  processes,  together 
with  the  results  of  Brown  (1971)  on  Poisson  point  processes,  we  are  able 
to  substantial ly  reduce  the  length  and  complexity  of  the  proof  of  both  the 
results  of  Briggs  (1975),  and  of  Brockett,  Hudson  and  Tucker  (1978).  A  key 
step  is  the  following  result  (Lemma  1)  for  Poisson  point  processes  due  to 
M.  Brown  (1 971 ) . 

First  we  recognize  that  a  Poisson  point  process  iron  (*,£)  can  be 
regarded  as  a  probability  measure  on  (A, A,)  where  A  is  the  set  of  all 
countable  subsets  of  atoms  of  X  (multiple  occurrence  permitted)  and  /V,is 
the  minimal  a-algebra  containing  all  sets  of  the  form  {a  e  A:  ir(C,a)  =  k) 
k  =  0,1,...,®,  C  e  f  and  where  1T(C,a) ,  a  c  A  is  defined  as  the  number  of 
atoms  comprising  a  e  A  which  are  contained  in  C  (counting  multiplicity). 

With  this  correspondence  in  mind,  we  have: 

Lemma  1  (Brown  1971)  Let  and  IT^  be  Poisson  point  processes  over  (*,C) 
with  o-finite  mean  measures  and  Q^.  Then'll  <<  ir^  if  the  following  hold 

i)  Q1  «  Q2 

ii)  Q](B^)  <  Qp(Bt)  <  00  for  all  t  >  0 

iii)  /  (p  -  D2dQ2 

BC 
t 


<  oo 
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where  p  =  and  B^.  =  (x:  |p(x)  -  lj  >  t).  Moreover,  if  and  Q2  are 


-(x) 


finite  measures , 


then  ( { x . } )  =  exp[-(Qj  (*)  -  y*)]  (x.)  where 


{x.}  =  {x.(oj)}  is  the  particular  realization  under  evaluation. 

It  should  be  noted  that  by  the  techniques  Newman  (1973)  or  Hudson 

and  Tucker  (1975)  it  can  be  shown  that  conditions  (ii)  and  (iii)  together 

1/2  p 

are  equivalent  to  the  condition  /(I  -  p  )  dQ^  <  =». 

We  now  state  our  first  results  concerning  the  case  with  projective  mean 
measures  Q1  and  Q2  finite.  (This  is  the  interesting  case  for  acoustical 
signal  detection  using  Middleton's  model  for  surface  reverberation  for 
randomly  rough  surfaces,  see  Brockett  and  Wilson  (1982).) 


Theorem  1  (Non-Stationary  Compound  Poisson  Case) 

Suppose  Y.j  (t)  and  Y2(t)  are  two  stochastically  continuous  infinitely 
divisible  processes  with  corresponding  projective  limit  measures  Q-j  and 
q2  finite. 

a)  If  Q1  «  Q2  and  /x(Q1 -Q2) (dx)  =  0,  then  =  PYj1  <<  u2  =  PY^1  * 
Moreover,  using  the  representation  (4), 

(Y,(‘))  =  /^wptxjir^dx)  +  Q2(lR[a’b])  -  Q1  (IR  [a’b!i)  where  p(x)  =  ^(x). 

b)  The  pj  distribution  of  the  log  likelihood  ratio  in  (a)  is  given 
via  the  characteristic  function  whose  logarithm  is 

iii  ;(u )  =  iu[(Q2-Q-|  )(IR  ^a,b^)]  +  /(exp{iuf«p(x)}  -1)  Q-jfdx).  Thus  In  -~ 

(Y^(*,w))  is  a  translated  compound  Poisson  random  variable  on  IR  with  intensity 
measure  v(A)  =  Q-j({x:  f«p(x)  c  A}). 

Proof  The  proof  of  a)  will  follow  immediately  from  Lemma  1  once  we  notice 
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that  according  to  (4),  and  the  corresponding  representation  for 

point  processes  as  measures  on  a  sequence  space,  we  have  Y(-,w) 

where  S ( l  x . } )  =  £  x.  -  c,  =  ix,  -  c?  and  (x.)  is  a  realization 
^  i  >0  1  1 

process  IT.  Here  c^  =  /xdQ^(x)  =  c^  =  /xdt^x)  by  the  assumption 

Now,  under  the  assumptions  of  the  theorem. 


Poisson 
=  S({xi(w)l) 
of  the  point 
/x d(Q1-Q2) (x)  =  0. 


=  T-jS" 


<<  y , 


=  tr2s 


-l 


and  by  the  lemma. 


dy,  dinS"1  dTT,  ,  chr, 

^(v’(-1)  =  ^'f)=4(s' f)=4(u<(",,) 

IT  ( 1R  ^  ’  ^3  ) 

=  expC(Q-| ( IR  [a,b])  -  Q2(IRta,b])]  n1p(xi (a>)) , 


which  is  the  formula  in  a)  once  the  product  is  converted  to  integral  form. 

Here  we  have  used  the  fact  that  if  v  and  n  are  two  measures  on  *  with 

v  <<  n,  and  S:  *  V,  then  vs”1  <<  ns-1  and  -  ~ T(y)  =  ^-(s~V)*  See  Lemma  1 

dns"  dn 

of  Brockett,  Hudson  and  Tucker  (1978).  Note  that  (i 1 )  and  (iii)  of  the 
lemma  are  obviously  satisfied  in  this  finite  measure  case. 

To  prove  b)  we  simply  notice  that,  according  to  the  lemma,  we  are 
dealing  with  a  Poisson  sum  (e.g. ,  (IR^a,b^))  of  random  variables, 

in  p(x.(u)).  The  characteristic  function  now  follows  from  routine  calcu¬ 
lations. 

Using  the  results  of  Theorem  1,  it  is  now  just  a  short  step  to  obtain 
the  general  theorem. 

Theorem  2 

Suppose  Y^(t)  and  Y^(t)  are  two  stochastically  continuous  infinitely 
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divisible  processes  with  corresponding  projective  limit  measures  Q,  and  CL. 

v  dQ,  1  2 

a)  If  i)  Q1  <<  Q2  with  P(x)  =  ^q“( x ) 

ii)  /xd(Q1-Q2)(x)  =  0 

iii)  /(l-p1/2(x))2dQ2(x)  <  « 
then  =  PY^  ^  <<  =  PY2  \ 

du, 

b)  Under  the  conditions  of  a )  in  ^J-(Y1  ( • ) )  =  fdi  r(x)ft*(dx)  +  fin  p  (x)Q,  (dx) 

2  ^  6t 

+  /[I  -  p (x )  +  £iip(x)]Q,  (dx)  +  /[l  -  p(x)]Q,(dx)  where,  as  before, 

IT*  =  TT1-Q1  and  Bt  =  {x :  j p (x)  -  1|  >  t). 

c)  The  logarithm  of  the  characteristic  function  of  in  (dw-,/dp2)  is 
(U  /(l  -  s(x)  t  -Jssdi)  )dQ  t  /(eiuy  .  ,  .  iuy  )0  g-l(dy) 

H(D,p(x))"  1  1+y2  1 

where  g  =  Dtp.  Thus  it  is  the  translate  of  an  infinitely  divisible 
random  variable  with  Levy  measure  M(A)  =  ( { x :  dip(x)  e  A}). 

Pr0°f  Veeh  (1981)  proves  a),  or  it  could  be  derived  from  Lemma  1  in  the 
previous  manner.  The  proof  of  b)  is  given  in  Briggs  (1975),  and  can  follow 
from  Theorem  1  by  using  her  techniques.  It  should  be  noted  that  she  does 
not  explicitly  state  assumption  (ii),  although  it  is  used  in  her  proof.  An 
alternative  proof  for  b)  can  be  constructed  from  Theorem  1  by  using  the 
techniques  of  Brockett,  Hudson  and  Tucker  (1978).  The  distribution  in  c) 
is  obtained  by  a  limiting  argument  from  Theorem  1  after  appropriately 
centering  in  a  manner  analogous  to  that  used  in  Brockett,  Hudson  and  Tucker 
(1978). 
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V.  Linear  Processes  with  Gaussian  Components 

In  this  section  we  show  an  alternative  formula  for  the  likelihood  ratio 
when  both  driving  functions  X-j  and  X2  do  have  Gaussian  components.  Our 
development  requires  the  additional  assumption  that  the  same  filter  is  used 
on  both  processes,  i.e.,  f^(t,s)  =  f2(t,s).  The  key  step  in  the  development 
is  the  following  very  general  result  due  to  Skorokhod  (1969,  p.  245,  Theorem  2) 
which  of  interest  in  its  own  right. 

Lemma  2  (Skorokhod  1964)  Suppose  X-j(t)  and  X2(t)  are  two  stochastic 
processes  inducing  measures  v-|  =  PX^1  and  \>2  =  PX^  on  function  space.  Let 
S  be  a  measurable  mapping  from  function  space  to  function  space,  and  =  SX^ , 
Y2  =  SX2  be  two  stochastic  processes  with  induced  measures  u-|  =  PY^1  and 

u2  =  PX^1.  If  v,  «  v2  then  ^  «  u2  and  ^(Y2(t))  =  E[~ <X?(t) )  1  YgU)]. 

We  shall  also  use  the  results  of  Brockett,  Hudson  and  Tucker  (1978) 
and  Brockett  and  Tucker  (1977)  for  processes  with  independent  increments, 
summarized  in  the  Lemma  3.  Let  X-j(t)  and  X2(t)  be  stochastical ly  continuous 
processes  with  independent  increments.  Their  characteristic  functions  are 
of  the  form 

fx(t)  (u)  =  exp  (iua(t)  -  a2(t)u2/2  +  /(eiux  -  1  -  i~)M.(dx)} 

1  1+x 

(5)  and 

fx(t)  (u)  =  exp  { iub(t)  -  r2(t)u2/2  +  /(elux  -  1  -  N.(dx)}. 

2  l+x4  L 

The  time- jump  measures  M  and  N  are  defined  as  before:  (cf  the  sentence 
following  equation  (2)).  The  functions  a  (t)  and  r  (t)  are  continuous,  non¬ 
negative,  and  nondecreasing  and  correspond  to  the  Gaussian  components  V^(t) 


and  V2 ( t )  respectively. 
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Lemma  3  Suppose  X-j  and  X2  are  processes  with  independent  increments  over 
[0,T],  and  v..  =  PX^  i  =  1,2  are  the  induced  measures  on  function  space 
D[0,T]. 

a)  ^  (i.e.,  v i  <<  v> ^  and  <<  v-j)  if  and  only  if  the  following 

all  hold: 

(i )  M  -v  N 

(ii)  o2(t)  =  r2(t),  0  <  t  1  T, 

(iii)  /(l-(dM/dN)1/2)2dN  <  «,  and 

(iv)  for  some  peL,,([0,T],  dc2)  and  all  t  E  [0,T), 

a(t)  -  b(t)  -  -  Nj(dx)  =  p ( 8 ) do 2 ( 9 ) . 

1+x*  1  1  u 

b)  If  <v  on  D[0,T],  then 

1  T  IT?? 

=  exp  p(8)dV(0)  -  p^(0)do^(0) 

+  !  (K&f(s,Js):  0iSiT,  0scln  -  /0  <$-  DUN)), 
n- 1  n 

Gn  =  [0 ,T]  x  In,  p(0)  is  as  defined  in  a(ir),  Ir  =  [cn ,enl  )U( -cn3 

with  en  4-  0,  and  is  the  size  of  the  jump  at  time  s. 

Using  these  results  we  may  now  state  the  conditions  for  equivalence 

directly  in  terms  of  the  processes  X^  and  X^. 

Theorem  3 

Suppose  Y-]  and  are  linear  processes  given  by  (1)  with  the  same 
filter  function  f  for  both  Y1  and  Y^ .  Let  the  distributions  of  the  driving 
processes  be  given  by  (5) .  If  (i)  -  (iv)  of  Lemma  3a  all  hold,  then 
P1  =  ^  Pj  :  and  du^/du^  is  given  by 


du,  dv.  dv, 

3“  (Y-j  (t) )  =  E[^j-  (X^t))  1  Y i  ( t )  =  /f(t,s)dXj(s)]  where  is  found 
explicitly  in  Lemma  3b. 


-1  -1  ^V1 

Proof  By  Lemma  3  it  is  clear  that  v,  =  PX,  %  -  PX0  and  -r—  is  as  given. 

-  I  l  2  c  Q\>2 

We  now  use  Lemma  2  to  calculate  the  general  likelihood  ratio.  Let 
S:  D[0,T]  -*•  D[0,T]  be  defined  by 

(Sg) (t)  =  /f(t,s)dg(s)  =  lim  Ef(t,Sj){g(Sj)  -  g(Sj)} 

where  the  limit  is  as  the  mesh  of  the  partition  converges  to  zero.  By  the 
definition  of  Y.  given  in  equation  (1),  and  by  the  results  of  Brockett  and 
Hudson  (1981)  it  is  easily  seen  that  S  is  defined  a.s.  with  respect  to  both 
the  measures  =  PX~^  and  ^  =  PX^ .  The  stated  likelihood  ratio  now 
follows  from  Lemma  2. 
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ABSTRACT. 

Let  f  be  a  density  on  R^,  and  let  f  be  the  kernel  estimate  of  f, 

fn(x)  =  (nhV1  l  K((x-X.)/h) 
n  i=i  1 

where  h=hn  is  a  sequence  of  positive  numbers,  and  K  is  an  absolutely  integrable 

function  with  /K(x)dx  =  1.  Let  J  =/|f  (x)-f(x)|dx.  If  lim  h  =  0  and  lim  nhd  = 

n  n  n  n 

then  Jn  -*•  0  completely  as  n  If  J  0  in  probability  as  n  -+•  °°,  And  K  is 

nonnegative  and  has  compact  support,  then  lim  h  =  0  and  lim  nh^  =  °°. 

n  n 
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1.  INTRODUCTION. 

The  purpose  of  this  paper  is  to  point  out  that  for  the  celebrated  Parzen- 
Rosenblatt  density  estimate  (Parzen,  1962 ;  Rosenblatt,  1956)  all  types  of 
Lj  consistency  are  equivalent.  We  consider  a  sample  Xj,...,Xn  of  independent 
Revalued  random  vectors  with  common  density  f,  and  estimate  f(x)  by 

j  ■  1  n 

f(x)  =  (nhd)  l  K(  (x-X . )/h) 
n  i=l  1 

where  h=hn  is  a  sequence  of  positive  numbers  and  K  is  a  Borel  measurable  function 

satisfying  K  >  0,  /K  =  1.  The  natural  measure  of  the  closeness  of  f  to  f  is  its 
~  n 

Lj  distance. 


Jn  =  • 


Our  main  result  is 

Theorem  1.  Let  K  be  a  nonnegative  Borel  measurable  function  on  Rd  with  /K(x)  dx  =  1. 
Assume  that  K  has  compact  support.  Then  the  following  conditions  are  equivalent  : 

(i)  J  ♦  0  in  probability  as  n  +  «  ; 

(ii)  Jn  -v  0  almost  surely  as  n  »  ; 

{ i i i )  Jn  -*■  0  completely  as  n  -*■  »  (  i.e.  £P(Jn>  e)  <  °°»  all  e  >  0) 

H  ^ 

(iv)  lim  h  =  0  and  lim  nh  =  «. 
n  n 

Also,  (iv)  implies  ( i i i )  for  all  K  that  are  absolutely  integrable  and  satisfy 
/K(x)dx  =  1. 


2 


Remark  1.  The  condition  that  K  has  compact  support  is  used  in  the  proof  of 
(i)  =>  (iv).  If  K  is  any  density  on  R0*,  and  (iv)  holds,  then  we  will  see  that 
P( Jn  >  e)  =  0(n~^)  for  all  e  ,  q  >  0. 

Remark  2.  If  (i)  holds  for  some  K  satisfying  the  conditions  of  Theorem  1, 
some  f  and  some  sequence  h,  then  (i v) ,  (i i i ) , (i i )  and  (i)  hold  for  all  densities 

K  and  f. 

We  would  like  to  clarify  the  connection  between  convergence  and 
pointwise  convergence  almost  everywhere.  Pointwise  convergence  almost  everywhere 
is  a  stronger  concept  as  can  be  seen  froip  the  following  variation  of  Scheffe's 
theorem  (Scheffe,  1947)  : 

Lemma  I.(Glick,  1974).  If  f^  is  a  sequence  of  density  estimates  (i.e.,  for  fixed  n, 
fn  is  a  Borel  measurable  function  of  x,Xj,...,Xn)  and  each  f^  is  a  density  in  x, 
then  Jn  ■*  0  in  probability  (almost  surely)  as  n  +  »  when  fR(>0  -*■  f(x)  in  probability 
(almost  surely)  for  almost  all  x. 

From  Lemma  1  and  well-known  results  about  the  pointwise  convergence  of 
kernel  estimates,  one  can  deduce  convergence  results  that  hold  for  all  f  (Devroye 
and  Wagner,  1979).  Unfortunately,  for  kernel  estimates,  strong  pointwise  convergence 
almost  everywhere  is  a  strictly  stronger  property  than  strong  convergence,  as 
can  be  deduced  from  Theorem  1  and  Lemma  2  : 

Lemma  2. (Deheuvels ,  1974).  If  f  is  bounded  and  continuous,  if  K  is  bounded  and 

Riemann  integrable,  if  h  -*■  0  as  n  and  inf  (n+k)h^+|,/(nh^)  >  0,  then 

n,x>0  d 

f  (x)  -+  f ( x)  almost  surely  for  all  x  if  and  only  if  nh  /log  log  n  -*•  »  as  n  -*•  ®. 
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Thus,  the  indirect  type  of  proof  based  on  Glick's  Theorem  (Lemma  1)  does 
not  give  the  sharpest  possible  L^  convergence  results  except  possibly  in  the  weak 

case. 


An  analogue  of  Theorem  1  can  be  given  for  several  other  density  estimates, 

such  as  recursive  versions  of  the  kernel  estimate  (surveyed,  for  example,  in 

Devroye  (1979)).  The  nearest  neighbor  estimates  are  not  in  L^,  and  cannot  be 

considered  in  the  same  light.  In  a  series  of  papers,  Abou-Jaoude  (1976a,  1976b, 

1976c)  deals  with  a  weak  analogue  of  Theorem  1  for  the  histogram  estimate.  His 

main  result  is  summarized  below.  Let  ¥n  be  a  sequence  of  countable  partitions  of 

Rd  into  A-finite  sets  where  A  is  the  Lebesgue  measure  :  ¥n={An.  .A^,  •  ■  • )  >  0  <  x(Ani) 

<  «•  .  For  x£Rd,  let  A  (x)=A  .  if  and  only  if  xeA  ..  Assume  that  the  Borel  sets  are 

n  nj  nj 

generated  by  U  ¥  .  Define  the  histogram  estimate  by 
m>n  m 


fn(x)  =  (A(An(x)))’1 


1  n 

i  .V, 


n 


(1) 


Here  I  is  the  indicator  function. 


Theorem  2. (Abou-Jaoude,  1976a,  1976c).  Let  f  be  estimated  by  f^,  defined  by  (1). 
If  J  -*■  0  in  probability  as  n  +  «  for  al  1  f,  then 


(i)  for  all  Borel  sets  B  of  finite  A-measure,  and  all  e  >  0, 

there  exists  nQ  such  that  for  all  n  >_  nQ,  there  exists  a  set 
Bn  in  the  o-algebra  generated  by  ¥n  such  that  A(B^Bn)  <  E; 


(ii)  sup  lim  sup 

M  >  0  n 

all  Borel  sets  C 
with  a(C)  <oo 


Vl(Anj>>M/n 


A  .nc)  =  0. 
nj 


If  (i)»(ii)  hold,  then  Jn  -*■  0  completely  as  n  +  ”  for  all  f. 
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For  example,  if  *?  consists  of  all  sets  of  the  form  n  [a.b  ,(a.+l)b  ) 
where  a^,...,a^  are  all  integers,  and  bn  is  a  sequence  of  positive  numbers,  then 
(i)  and  (ii)  hold  if  and  only  if  b^  -*•  0  and  nb^  -+  as  n  -*■  «.  We  note  here  that 
the  necessity  of  (i),(ii)  in  Theorem  2  follows  from  ■+  0  in  probability  as  n  -*■  •» 
for  all  f".  In  contrast,  the  necessity  of  (iv)  in  Theorem  1  follows  from  0 

in  probability  as  n  ■*  “  for  some  f". 


Abou-Oaoude1 s  ground-breaking  work  excepted,  very  little  is  known  about  the 
Lj  properties  of  density  estimates.  Some  i  ‘ormation  about  the  best  possible  rate 
of  convergence  to  0  of  can  be  found  in  Bretagnolle  and  Huber  (1979).  For  more 
references  on  density  estimation  in  general,  the  reader  should  consult  Wegman  (1972), 
Wertz  (1978),  Wertz  and  Schneider  (1979)  or  Bean  and  Tsokos  (1980). 


2.  PROOF  OF  THEOREM  1. 


We  will  try  to  extract  the  key  facts  needed  in  the  proof  of  Theorem  1. 
They  are  condensed  in  several  Lemmas  of  independent  interest.  Lemmas  3  and  4 
are  integral  and  pointwise  versions  of  the  Lebesgue  density  theorem.  Lemma  5 
contains  a  crucial  inequality  for  the  multinomial  distribution,  and  Lemma  6 
establishes  that  ( i v )  =>  (iii)  without  the  compactness  condition  for  K. 

Lemma  7  is  an  L^^  version  of  the  non-existence  of  unbiased  kernel  density 
estimates,  and  in  Lemmas  8,9  and  lOwe  present  some  unimportant  rather  basic 
inequalities.  The  necessity  of  ( i v )  for  (i)  is  established  in  Lemma  11.  Since 
(iii)  =>  ( i i )  =>  (i),  this  would  then  complete  the  proof  of  Theorem  1. 

\ 

\ 

Lemma  3. (L^  version  of  Bochner's  theorem) . 

Let  K  be  an  absolutely  integrable  function  on  Rd  with  /K(x)dx  =  1,  and 

let  h=h  be  a  sequence  of  positive  numbers  satisfying  lim  h  =  0.  For  each 
n  n 

density  f,  we  have 


where 


lim  / | gh(x)-f ( x) |dx  =  0 
n 

gh(x)  =  h'd/K((x-y)/h)  f(y)  dy. 


Proof  of  Lemma  3. 

The  proof  is  based  on  a  technique  of  Kantorovich  and  Akilov  (1964).  I  am 
grateful  to  Laszlo  Gyorfi  for  pointing  this  reference  out  to  me.  We  let  C=/|K(x)|dx, 
and  note  that  by  a  change  of  integral,  for  any  function  f. 


/ I gh( x) 1 dx  <_  //h’d|K((x-y)/h) I ]f(y)|dy  dx  =  C/ |f(y)|dy. 


6 


For  each  c  >  0  there  exists  a  continuous  function  f*  vanishing  outside 
a  compact  set  (say,  where  is  the  closed  sphere  of  radius  r  centered  at  x) 
such  that  /|f(x)-f*(x) jdx  <  e.  Thus,  if  we  write  gh(f,x)  to  make  the  dependence 
upon  f  explicit,  then 


7 


8 


Proof  of  Lemma  5. 

The  proof  is  based  upon  a  Poissonization.  Let  N  be  a  Poisson  (n)  random 
variable  independent  of  ,  a  sequence  of  independent  {1,. . . ,k}-valued 

variables  distributed  according  to  P(Uj=i)=p.,  1  _<  i  <_  k.  Let  be  the  number 
of  occurrences  of  the  value  i  among  and  let  Xj  be  the  number  of 

occurrences  of  the  value  i  among  It  is  clear  that  Xj....,X£  are 

independent  Poisson  random  variables  with  means  np^,...,npk,  and  that  Xj,...,X^ 
is  a  multinomial  (n.Pj,.. . ,pk)  random  vector.  Since  E(X^)  =  np^ ,  we  have 


k  k  k 

y  — |X.-np.  |  <  y  -jX.-X'.j  +  y  E(— I X  !-np .  1 ) 
n1  i  1  n!  i  i1  ln'  i 


+  .1  £(|X;-np.|-E(|Xl-np.|)) 


^  ify/n  +  _^i(|x:-np.  |-E(|X‘-np.|)). 


By  the  Cauchy-Schwartz  inequality,  the  middle  term  in  (2)  does  not  exceed 
k  1/2  , 

(k  l  p./n)  =  /E7n  <  e/3  .  Now,  let  Z.=£|X!-np. ( ,  Y.=Z.-E(Z.).  Then, 

,j  _  ^  I  ••II  •  11  1 

k  k 

P(  I  -JlX.-np. I  >  e)  <  P(  y  Y.  >  e/3)  +  P( | N-n |  >  ne/3) 
i=l  n  1  1  i=l  1 


4  k  .  k  k 

\^/  r  rl\, H,  ,  r  V  r 


,  2  \  r-  /  «2  , 


<  (3/ej(  l  E(Y?)  +  6  y  l  E(Y‘)E(Y‘')  +  n'HE((N-n)H)  ). 
i=l  1  i=l  j=l  1  3 

j>i  (3) 


Now, 


E((N-n)^)  =  n+3n^  <_  4n^. 


(4) 
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Also, 

E(Y4)  =  E(Z4)  -  4E(zJ)E(Zi)  +  6E(Z*)(E(Z.))2  -  3(E(Z.))4 

1  E(Z4)  +  6(E(Z2))2 

=  n‘4(np.+3n2p2)  +  f>(np./n2)2 

=  9p?/n  +  p./nJ  , 
and 

E(Y2)  <  E(Z2)  =  p./n. 

Combining  these  inequalities  and  resubstituting  them  into  (3)  shows  that  the  left 
hand-side  of  (3)  is  not  greater  than 

k  k  k 

(3/e)4  (4/n2  +  l  (9p?/n2  +  p./n3)  +6  l  l  p.p./n2) 
i=l  1  1  i=l  vi=l  1  J 

j>i 

<  (3/e)4(4/n2  +  (9/n2)(Pl+...+pk)2  +  1/n3) 

<  (3/e)4(14/n2)  =  1134/(nZe4). 
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Lemma  6.  For  any  density  f  on  Rd,  and  any  absolutely  integrable  function  K 
with  /K(x)dx  =1,  Jn  -*■  0  completely  as  n  °°  whenever 

lim  h  =  0  and  lim  nhd  = 
n  n 

Proof  of  Lemma  6. 

Let  be  defined  as  in  the  statement  of  Lemma  3.  By  Lemma  3,  it 
suffices  to  show  that  /|fn(x)-g^(x) jdx  0  completely  as  n  +  ®.  Let  un  be  the 
empirical  probability  measure  for  Xj»...»X  ,  and  note  that 

fp(x)  =  h‘d/K((x-y)/h)  un(dy) . 

For  given  e  >  0,  find  finite  constants  M  '_,N,a^, . . . ,a^  and  disjoint  finite  rectangles 
Aj,...,An  in  Rd  such  that  the  function 

N 

K*(x)  «  I  a.  I.  (x) 
i=l  1  Mi 

satisfies  :  |K*|  £  M,  K*=0  outside  [-L,L]d,  and  f\ K(x)-K*(x) |dx  <  e.  Define 

g*  and  f*  as  g.  and  f  with  K*  instead  of  K.  Then 
•71  n  n  n 

/|fn(x)-gh(x)|dx  < 

/ I fn(x)-f*(x) 1 dx  +  / | f*(x)-g*(x) I dx  +  /|g£(x)-gh(x)|dx 
<_  /h_d/|K*((x-y)/h)-K((x-y)/h)|f(y)  dy  dx 
+  /h"d/|K*({x-y)/h)-K((x-y)/h)|pn(dy)  dx  +  /| f*(x)-g*(x)|dx 


1 


<  2c  +  /|f*(x)-g*(x)|dx 


by  a  double  change  of  integral.  But  if  w  is  the  probability  measure  for  f. 


N  j  . 

/|f*(x)-9u(x)idx  <  I(a.j/|h  /  f(y)dy  -h"d  /  u  ( dy )  | dx 
n  n  i=l  11  x+hA.  x+hAi  n 

_  j  N 

±  Mh  a  l  / 1  ia(x+hA.  )-p  ( x+hA . )  |  dx  . 
i=l  ini 


Lemma  6  follows  if  we  can  show  that  for  all  finite  rectangles  A  of  R^, 


h~d/[p(x+hA)-;in(x+hA)  (dx  -*  0  completely  as  n  + 


Choose  an  A,  and  let  c  >  0  be  arbitrary.  Consider  the  partition  of  into  sets 

B  that  are  d-fold  products  of  intervals  of  the  form  *(i-l)h/N,ih/N) ,  where  i  is 

an  integer,  and  N  is  a  fixed  constant  to  be  chosen  later.  Call  the  partition  v. 

d  d  _ 

If  A=  n  x.,x.+a.)  and  min  a.  >  2/N,  then  we  let  A*~  n  jx.+l/N,x.+a . - 1/N) . 

1=1  1  1  1  i  1  _  i  =  lLl  11 

Def i ne 

C  =  x+hA  U  B  Q.  x+h(A-A*)  =  C*  . 
x  Be*  x 

B^x+hA 

Clearly, 

Z|u(x+hA)-un(x+hA) J  dx 


£  fl  | p(B)-pn(B) | dx  +  /(p(Cx)+pn(Cx))dx. 

D  &V 

B^x+hA 


(5) 
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The  last  term  in  (5)  equals 


j  .  d  d 

2x(h(A-A*))  =  2hax (A- A* )  =  2h°(  n  a.  -  n  (a --2/N) ) 

i=l  1  i=l  1 


=  2hdX (A) ( 1-  n  ( 1-2/ (Na . ) )  <  4hdx(A)  j  aT1  /N  <  ehd 
1=1  1  1=1  1 


by  choice  of  N.  We  used  the  fact  that  for  any  set  C,  and  any  probability  measure 
v  on  the  Borel  sets  of  Rd,  /  v(x+hC)  dx  =  X(hC).  For  any  finite  constant  R  >  0,  we 
can  rewrite  the  first  term  in  (5)  as 


J  |w_(B)-p(B)| 

Bey  n 

BoSqr^ 


/  dx  + 
Bix+hA 


B-x+hAdX  ^n^S0R^_li^S0R^  + 


(6) 


Here  (.)c  denotes  the  complement  of  a  set. 
y(SgR)  <  e  by  our  choice  of  R.  Also, 


Clearly,  h~d  /  dx  _<  x(A),  and 
BS-x+hA 


P  VW' 


v(SQR)  >  t) 


<  e 


-2m 


by  Hoeffding's  inequality  for  binomial  random  variables  (Hoeffding,  1963).  Finally, 
since  the  collection  of  sets  Be-'*  with  B'fSg^  has  at  most  (2RN/h  +2)d  =  o(n) 
elements,  we  see  that  by  Lemma  5,  for  all  n  large  enough, 


P(  I  lvn(B)-u(B)|  >  e)  £  4^  ■ 

Btf  "  n  c 

which  is  summable  in  n.  Thus,  for  all  e  >0,  P( (6)  >  e)  is  summable  in  n.  This 
concludes  the  proof  of  Lemma  6. 
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2  Q 

Remark  3.  The  bound  0(n  j  in  Lemma  5  can  be  improved  to  0(n-C|)  for  any  q  >  1 
by  using  a  slightly  more  sophisticated  technique.  If  this  new  bound  is  used  for 
the  first  term  of  (6),  we  obtain  the  stronger  result  that  under  the  conditions  of 
Lemma  6, 

oo 

l  nq  P(J  >  e)  <  »  .all  q,e  >  0. 
n=l  n 

Lemna  7. (Nonexistence  of  unbiased  kernel  density  estimates) . 

Let  K  and  f  be  arbitrary  densities  on  R1^,  and  let  be  defined  as  in 
Lemma  3.  For  any  fixed  a  >  0, 

/|f(x)-g  (x) | dx  >  b. 

a 


Proof  of  Lemma  7. 

It  is  obvious  that  g= ( x)=E( f  (x))  is  the  density  of  X+Z  where  X  has  density 

a  n 

f  and  Z  is  independent  of  X  and  has  density  a-c*K(x/a).  If  $  and  i|»  are  the 
characteristic  functions  of  X  and  Z,  then  /|f(x)-g,(x)|dx  =  0  implies  f=g  for  almost 

a  » 

all  x,  and  thus,  4>(t)=<t>(t)^(t)  for  all  t  R^.  For  <j>(t)j*0  (i.e.,  at  least  in  a 
neighborhood  of  the  origin),  we  must  have  <j>(t)=l.  But  then  ip  cannot  be  the  characteristic 
function  of  a  density  on  R^,  and  we  have  a  contradiction. 

Lemma  8. (A  binomial  distribution  inequality) . 

If  N  is  a  binomial  (n,p)  random  variable,  then 

•I  e"4*iV(2im)  when  y^>P>^.n>4, 

Ellj-Pl)  i  _2  _  A  1  " 

e  p  when  p  <  —  ,  n  >  2. 

r  n  _ 
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Lemma  9. 

If  Xj,...»Xn  are  independent  identically  distributed  random  variables  taking 
values  in  ^O^u{c,m)  for  some  c  >  0,  then 

E(|lji(XrE(Xj))|)  »  c  E(|g-p|) 

where  N  is  a  binomial  (n,p)  random  variable  and  p=P ( X ^  >_  c). 
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Proof  of  Lemma  9. 

Let  X*  have  the  distribution  of  Xj  restricted  to  XJ  21  c.  Clearly, 

E( X* )=E( X^ I x^>c)/P=E(X1)/p.  If  N=  l  Ix  >c,  then 

E(l"J1(xi“E(xi))i) =  E(|ni1xi'H£(xi)  +  (rP>E0(i)i) 

=  E( | Y  +  (~p)E(X*)|)  (by  definition  of  Y) 

1  E(|E(Y|N)  +  (-~p)E(X|) | )  =  E(|{j-p|)E(XJ) 

t 

1  c  E(lg-p|). 

Lemma  10. 

If  Z  is  any  random  variable  taking  values  on  [o,c],  c  >  0,  then 
/ 

1  P(Z>  e)  ’  311  E  >  °- 

Thus,  if  Zj.Z^,...  is  any  sequence  of  ^),c]-valued  random  variables,  then  Zn  0 
in  probability  if  and  only  if  E(Zn)  0. 

Remark  4.  The  second  statement  of  Lemma  10  is  also  a  consequence  of  an  inequality 
due  to  Feller  (1971,  pp.  152):  P(Z  >  eE(Z))  >  (l-£)2/E(Z2)  >  ( 1- e ) 2/ ( cE( Z ) ) ,  0  <  e  <  1. 
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Lemma  11.  Let  K  and  f  be  densities  on  R^.  If  J  0  in  probability  as  n  -*  »,  then 


lim  h  =  0. 
n 


If  also  K  has  compact  support,  then 


1  im  nh^  =  ®. 
n 

Proof  of  Lemma  11. 

Since  <_  2  ifor  all  n,  J  -*  0  in  probability  if  and  only  if  lim  E(J  )  =  0 

n 

(Lemma  10).  If  is  defined  as  in  Lemma  3,  then 

E(Jn)  =  E(/|fn(x)-f(x)|dx)  >  / | E ( f n (x) )-f(x) 1 dx  =  / |gh(x)-f (x) | dx. 


Assume  first  that  lim  h  =  c  >  0.  Clearly,  when  c  <  °°, 
n 


/|9h(x)-f(x)|dx  >  /|gc(x)-f(x)|dx  -  /|gh(x)-gc(x)|dx.  (8) 

As  in  the  proof  of  Lemma  6,  we  find  for  each  e  >  0  finite  constants  M,L,N,a. . ,aw  >  0 

N  1  N 

and  disjoint  finite  rectangles  A.,...,AN  such  that  the  function  K*=  J  a. I. 

i  N  i  a.. 

satisfies  |K*|  <_  M,  K*=0  outside  |j-L,L^  ,  and  f\ K*(x)-K(x)  |  dx  <  e.  If  g*  is  defined 
as  with  K*  instead  of  K,  we  have 

/|9h(x)-gc(x)|dx  i  /|gh(x)-g*(x) | dx  +  /|gjx)-g£(x) | dx 

+  /|gf|(x)-g*(x)|dx 
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N  N  . 

£  2e  +  /|  l  a.p(x+hA.)h  -  )'  a.p(x+cA.)c  |  dx 

i=l  1  1  i=l  1  1 


£  2c  +  l  h  da./|p(x+hA. )-p(x+cA. ) |dx 
i=l  1  1  1 


+  l  c  ./ |p(x+hA. )-u(x+cA. ) |dx  +/  Y  a . I c~d-h~d | p(x+cA. )dx 
1=1  1  1  1  1=1  1  1 

£  2e  +  l  a  - ( | c  d-h  d|x(cA.)  +  (h~d+c~d)x(hA. acA. )) 

1=1  1  1  .  1  1 

=  2e  +  0 ( 1 ) .  (9) 


Since  e  >  0  was  arbitrary,  we  conclude  from  (8)  and  (9)  that 


lim  inf  /|g,(x)-f(x) jdx  =  /|g_(x)-f(x)|dx.  (10) 

n  "  L 


But  the  right-hand-side  of  (10)  is  positive  by  Lemma  7.  This  contradicts  the  fact 
that  the  left-hand-side  of  (10)  is  0.  Therefore,  no  subsequence  of  h  can  converge  to 
a  finite  positive  constant.  Assume  next  that  c=  °°.  We  will  show  that 


lim  inf  /|gh(x)-f(x) jdx  £  1,  (11) 


which  again  leads  to  a  contradiction  with  lim  E(J  )=0,  so  that  no  subsequence  of  h 

n 

can  diverge  either.  Therefore,  lim  h  =  0.  To  prove  (11),  we  take  an  arbitrary  c  >  0 

n 

and  find  M  so  large  that  /  K(x)dx  <  e.  Define  g£  and  g£*  as  g.  after  replacement 

K(x)>M  h  h  h 

of  K  by  KI^  and  KI^C  respectively,  where  A={x:  K(x)  £  M).  We  have 


t 
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/ |gh(x)-f (x) I dx  >  / I g* ( x)- f ( x) I dx  -  /g£*(x)  dx 


=  /|g(*(x)-f(x}Jdx  -  /  K(x)dx  >  /|g*(x)-f(x)  jdx  -  c. 

h  K(x)>M  h 


Also,  by  Fatou's  Lemma,  since  g*(x)=h 
<  Mh"d  ->  0  for  all  x. 


"d/K( (x-y )/h) I « ( (x-y)/h)f (y )dy  <  Mh*d  /  f(y)dy 
M  x-hA 


1 i m  inf  /|g*(x)-f{x) | dx  >_  /lim  inf  |g*(x)-f(x) |dx  =  1, 
n  n 


so  that  (11)  follows. 

We  now  assume  that  K  has  compact  support,  that  lim  E(Jn)  =  0  and  that 

d  ^ 

lim  nh  =  c  ,  c  >_  0.  By  Lenina  3,  / 1 g,  (x)-f (x)  | dx  +  0  as  n  °°.  Thus, 

n 

lim  inf  E(Jn)  >_  lim  inf  E  ( / 1  fp  ( x )- g^  ( x )  |  dx) .  (12) 

n  n 

We  will  show  that  the  right-hand-side  of  (12)  is  positive  for  any  c  0,  and  thus, 
since  the  left-hand-side  is  0,  we  must  conclude  that  for  all  subsequences  of  h, 
nhd  -*■  «.  Define  A={x:  cx(B)f(x)  >  1}  ,  where  B={x:  K(x)  b}  ,  and  b  >  0  is  a  constant 
that  will  be  defined  further  on.  Let  K*=KIg,  and  g*  uses  K*  in  its  definition,  then 

E(/|fn(x)-gh(x) | dx)  >  E(/|f*(x)-g*(x)|dx) 

-E(/h-d/ | K( ( x-y )/h )-K*( (x-y )/h )  dx) 

-E(/h~d/| K( (x-y)/h)-K*((x-y)/h) j  f(y)dy  dx) 

i 


=  E(/|f*(x)-g*(x)|dx)  -  2/ | K(x)-K*(x) | dx 
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T1 


=  E(/| f*(x)-g*(x) | dx )  -  2  /  K(x)  dx  .  (13) 

n  h  K(x)<b 

By  Lemma  9,  E(  |  f * ( x ) - g* ( x )  | )  >_  bh  ( | ~—P | )  where  N  is  binomial  (n,p)  and 

p=P( K*(  (x-X.  )/h )  _>  b)=P(K( (x-X.  )/h)  >_  b)  =  P(X.€. x-hB)=  /  f(y)dy.  Assume  that  n  ^  4. 

_2  -4  x-hB 

Let  r=min(e  ,e  /v2n).  8y  Lemma  8, 


Thus , 


E(|f*(x)-g*(x)|) 


rbh  d/p/n  >_  rbh  d/n  ,  1/  *^fT  >_  p  >_  1/n, 
-d 

rbh  p  ,  p  <  1/n. 


E(/|f*(x)-g*(x) | dx) 


(14) 


>.  rbh"d(^-A(x:n"1,/2>  /  f(y)dy>n-1)  +  /  f  f(y)dy  dx). 

!  x-hB  S  f(y)dy<l/n  x-hB 

x-hB 


Assume  throughout  that  a(B)  >  0.  Let  L  be  the  set  of  all  x  for  which  A~(x-hB)  /  f(y)dy 

x-hB 

tends  to  f(x)  as  h  +  0.  By  the  compactness  of  B,  L  includes  almost  all  x  (Lemma  4). 

We  claim  that  the  first  term  on  the  right-hand-side  of  (14)  tends  to  rbx(A)/c  when 

c  >  0.  Let  C  be  a  compact  set  of  Rd,  and  let  A  ={x:  l//n  >  /  f(y)dy  >  1/n}.  Clearly, 

d  "  x"hB 

AnHL  ->  AX,  and  nha  •+•  c.  Thus,  by  the  Lebesgue  dominated  convergence  theorem, 

/  dx  -*■  f  dx  =  /  dx,  which  is  arbitrarily  close  to  a(A)  by  choosing  C  large 

A  OLoC  A'iLX  AaC 

n 

enough.  For  c  =  0,  we  consider  the  second  term  of  (14)  only.  First,  for  all  x<sL 

with  f(x'  >  0,  the  condition  f  f(y)dy  >  —  is  violated  for  all  n  lame  enounh. 

x-hB  n 


Let  f.  ( x )  =  /  f(y)dy.  The  last  term  in  (14)  has  the  following  limit  infimum  : 

n  x-hB 


lim  inf  rbA(B)  _  /  A_*(hB)f,  (x)dx 

n  fh(x)£l/n  n 

1  rbx(B)  /lim  inf  Iy  (x)<1/nX~1(hB)  ?h(x)  dx 


=  rbA(B)  /f(x)  dx 


.  =  rbA(B). 


Here  we  used  Fatou's  Lemma  and  Lemma  4.  Ti\us,  the  limit  infimum  of  (13)  is  at  least 


rbA(A)/c  -2  /  K(x)dx  ,  c  >  0, 

K(x)<b 

rbA(B)  -  2  I  K(x)dx  ,  c  =  0. 

K(x)<b 

Choose  b  >  0  small  enough  such  that  a(B)  >  0  and  that  both  expressions  are  strictly 
positive.  This  can  be  done  since  for  c  >  0, 


lim  inf  A(A)/(^-  /  K(x)dx)  =  ® 

b  4  0  DK(x)<b 

and  for  c  =  0, 

lim  inf  A(B)/(i  /  K(x)dx)  =  ». 

a  4  0  °K(x)<b 


It  is  here  that  we  use  the  compactness  of  K.  This  concludes  the  proof  of  Lemma  11. 
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3.  DISCRIMINATION. 

We  would  like  to  point  out  one  important  application  of  Theorem  1. 

In  the  discrimination  problem,  we  are  given  a  sequence  (Xj,Y^),. . .  »(X  *Yn)  of 

independent  Rdx{ 1,. .. ,M}- valued  random  vectors  distributed  as  (X,Y)  but 

independent  of  (X,Y).  We  construct  an  estimate  Y  from  X  and  the  data  sequence, 

say.  Y  =  gn ( X ) .  The  probability  of  error  for  the  given  estimate  and  data  sequence 

is  L  =P(gn(X)^Y I Xj.Yj . X  ,Y  ) ,  and  this  is  always  at  least  equal  to  the  Bayes 

probability  of  error  L*=  inf  P (9 ( X )^Y ) .  If  X  has  a  density  f,  and  if 

g:Rd-*-{l, . . .  ,M  } 

we  construct  the  density  estimates 

H  -1  n 

f  ,(x)  =  (nha)  l  K((x-X  )/h)  IY  .  ,  1  <  i  <  M,  (15) 

n1  j=l  J  j  1 

and  if  we  define  g  (x)  as  the  first  integer  i  for  which  f  .(x)=  max  f  .  (x),  then 

l±k£.M  nK 

how  is  L  related  to  L*  ?  In  other  words,  in  what  senses  does  L  converge  to  L*  ? 

n  n 

The  simple  rule  mentioned  here  can  be  found  under  the  name  "potential  function 
method"  in  the  Russian  literature  (see  e.g.  Bashkirov,  Braverman  and  Muchnik  (1964)). 
Its  properties  were  subsequently  studied  by  Van  Ryzin  (1966),  Rejto  and  Revesz  (1973), 
Glick  (1972,1976),  Greblicki  (1978),  Devroye  and  Wagner  ( 1980a ,1980b)  and 
Spiegelman  and  Sacks  (1980).  In  this  note,  we  can  offer  the  following  result  : 

Theorem  3. 

Let  K  be  an  absolutely  integrable  function  with  positive  integral  over  Rd, 
and  let  X  have  a  density  f.  Then  the  discrimination  rule  defined  by  (15)  satisfies 

Oo 

y  nq  P(L  -L*  >  e)  <  co  ,  all  q,e  >  0, 
n=l  n 

lim  h  =  0,  and  lim  nhd  =  «. 
n  n 


whenever 
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Remark  5.  Theorem  3  contains  all  previously  known  consistency  results  for 

the  discrimination  rule  (15)  that  are  based  on  the  assumption  that  X  has  a 

density  f.  With  additional  conditions  on  K  (i.e.,  c.^  1  K  >_  c^I^  for 

1  *0rl  c  50r2 

some  Cj.C2.rj.r2  >  0),  we  know  that  Ln  -*  L*  in  probability  for  all  distributions 
of  (X.Y)  (Devroye  and  Wagner  (1980);  Spiegelman  and  Sacks  (1980)).  If  we  also 
ask  that  rj=r2  and  nhd/log  n  -*•  »,  then  Ln  -*■  L*  almost  surely  for.  all  distributions 
of  (X.Y).  From  our  Theorem  3,  it  is  clear  that  the  condition  nhd/log  n  -=»  is 
not  needed  whenever  X  has  a  density. 


Proof  of  Theorem  3. 


1  n 

We  introduce  some  new  notation  :  p.=P(Y=i),  pn^=  —  l  Iy  ,  f..  i 

M  j  =  1  j 


s  the 


density  of  X  given  that  Y=i,  and  fnQ= J  f  ^ .  Then,  by  (12)  of  Devroye  and  Wagner  (1980b), 

M  f  .(x)  \>JAx) 

Ln'L*  -  Jj  7TxT  lf(x)dx 

“  J/lpif1<x)-fn,-(x>|dx  +  J/ 


dx 

M 


i  I  P„infj(x)-p-^l<lx  +  /|f(x)-fn0(x)|dx  +  I  |P,-P„i 
1  =  1  l-l 


M  f  .(x)  M 

<2  1  P./I^(x)--21—  |  dx  ♦  .1  |prpn1|. 

1=1  rni  1=1 


Let  us  look  at  i=l  only.  By  Hoeffding's  inequality  (Hoeffding,  1963), 
P(|PfPnil>c)  <  2exp(-2nc2),  all  c  >  0.  Assume  that  Pj  >  0,  and  let  N=nP(ij. 
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Note  next  that  E(fnl(x)/pn^ | N) ( x)  (defined  in  Lemma  3)  if  f  is  replaced  by  f^. 

Thus, 

f,(x) 

Mx 

1  pnl 

(16) 

fn](x) 

£/|f1(x)-gh(x) | dx  IN>Q  +  / | gh(x)  —  |dx  IN>Q  +  2IN=0- 

nl 

The  first  term  on  the  right-hand-side  of  the  inequality  tends  to  0  as  h  •+  0  by 
Lemma  3.  Conditional  on  N,  the  second  term  is  distributed  as  / 1 E ( f ^ ( x ) ) - f m ( x ) j dx 
where 

fN(x)  =  (Nhd)_ 1  J  K( (x-X. )/h) 

"  i=l  1 

and  are  independent  random  vectors  with  common  density  fj.  In  the  proof 

of  Theorem  1,  we  have  seen  that  for  every  e  >  0  there  exist  positive  constants  c i 
only  depending  upon  e  ,  K  and  f^  such  that 

P(/|t(fN(x))-fN(x)|dx  >  e | N)  <  Cl/Nq, 
valid  when  (c^/h  +l)d  <  c^N. 

Thus , 

f  (  X  ) 

p(/| gh ( x) — — - J  dx  IN>0  >  e)  <  P(N  <  np  J2)  +  Cl(np  /2)'q  , 

pnl 

valid  when  (c^/h  +l)d  <  np^  . 

Since  nhd  ->  the  last  inequality  is  valid  for  all  n  large  enough.  The  term 

o 

P(N  <  np1/2)  does  not  exceed  exp(-np^/2)  by  Hoeffding's  inequality,  and  the  last  term 
of  (16)  is  treated  similarly.  Theorem  3  now  follows  by  the  arbitrariness  of  e  and  q. 


j 


4.  ACKNOWLEDGEMENTS. 


This  paper  was  written  during  the  suniner  of  1931  while  the  author  was 
visiting  Applied  Research  Laboratories  and  the  University  of  Texas  at  Austin. 

I  wish  to  thank  Charles  Baker  and  the  Office  of  Naval  Research  for  the  generous 
support. 


25 


REFERENCES 

•  ABOU-JAOUDE,  S.  (  1976a).  Sur  une  condition  necessaire  et  suffisante  de  L ^-convergence 

presque  complete  de  l'estimateur  de  la  partition  fixe  pour  une  densite. 

C.  R.  Acad,  Sci.  Paris,  Serie  A  283  1107-1110. 

*  ABOU-JAOUDE,  S.( 1976b).  Sur  la  convergence  et  de  l'estimateur  de  la  partition 

/  ^ 

aleatoire  pour  une  densite.  Ann.  Inst.  Henri  Poincare  12  299-317. 

«  ABOU-JAOUDE,  5. (1976c).  Conditions  necessaires  et  suffisantes  de  convergence  en 
probabilite  de  1 'histogramme  pour  une  densite.  Ann.  Inst.  Henri  Poincare  12 
213-23'* . 

BASHKIROV,  O.A.,  BRAVERMAN,  E.M.  and  MUCHNIK,  I. E. (1964).  Potential  function 

algorithms  for  pattern  recognition  learning  machines.  Automation  and  Remote 
Control  25  692-695. 

.  BEAN,  S.J.  and  TSOKOS,  C.P.(1980).  Developments  in  nonparametric  density  estimation. 
Int.  Statist.  Rev.  48  267-287. 

.  BRETAGNOLLE,  J.  and  HUBER,  C.(1979).  Estimation  des  densites  :  risque  minimax. 

V 

Zeitschrift  fur  Wahrscheinl ichkei tstheorie  und  verwandte  Gebiete  47  119-137. 
DEHEUVELS,  P.(1974).  Conditions  necessaires  et  suffisantes  de  convergence  ponctuelle 

A  A 

presque  sure  et  uni  forme  presque  sure  des  estimateurs  de  la  densite.  C.  R. 
Acad.  Sci.  Paris,  Serie  A  278  1217-1220. 

DEVROYE,  L.(1979).  On  the  pointwise  and  the  integral  convergence  of  recursive  kernel 
estimates  of  probability  densities.  Uti litas  Mathematica  15  113-128. 

DEVROYE,  L.  and  WAGNER,  T . J . ( 1979 ) .  The  LI  convergence  of  kernel  density  estimates. 
Ann.  Statist.  ]_  1136-1139. 

DEVROYE,  L.  and  WAGNER,  T.J.( 1980a).  On  the  LI  convergence  of  kernel  estimators  of 

ii 

regression  functions  with  applications  in  discrimination.  Zeitschrift  fur 
Wahrscheinl ichkei tstheorie  und  verwandte  Gebiete  51  15-25. 


26 

DEVROYE,  L.  and  WAGNER,  T.J.( 1980b).  Distribution-free  consistency  results  in 
nonparametric  discrimination  and  regression  function  estimation.  Ann . 

Statist.  8  231-239. 

FELLER,  W.(1971).  An  Introduction  To  Probability  Theory  And  Its  Applications,  vol .II, 

2nd  ed.,  John  Wiley,  New  York. 

FRAME,  J . S . ( 1945 ) .  Mean  deviation  of  the  binomial  distribution.  American  Mathematical 
Monthly  52  377-379. 

GLICK,  N.(1972).  Sample-based  classification  procedures  derived  from  density  estimators. 
J.  Amer.  Statist.  Assoc.  67  116-122. 

GLICK,  N . ( 1974 ) .  Consistency  conditions  for  probability  estimators  arid  integrals  of 
density  estimators.  Uti litas  Mathematica  6  61-74. 

GLICK,  N. ( 1976 ) .  Sample-based  classification  procedures  related  to  empiric  distributions. 

IEEE  Trans,  on  Information  Theory  IT-22  454-461. 

GREBLICKI,  W. ( 1978) .  Asymptotically  optimal  procedures  with  density  estimates.  IEEE 
Trans,  on  Information  Theory  IT-24  250-251 . 

HOEFFDING,  W.(1963).  Probability  inequalities  for  the  sum  of  bounded  random  variables. 

J.  Amer.  Statist.  Assoc.  58  13-30. 

JOHNSON,  N.L.  and  KOTZ,  5.(1969).  Distributions  In  Statistics:  Discrete  Distributions. 
John  Wiley,  New  York. 

KANTOROVICH,  L.V.  and  AKILOV,  G.P.(1964).  Functional  Analysis  In  Normed  Spaces. 

Pergamon  Press. 

*  PARZEN,  E . (  1962 ) .  On  the  estimation  of  a  probability  density  function  and  the  mode. 

Ann.  Math.  Statist.  33  1065-1076. 

REJTO,  L.  and  REVESZ,  P.(19/3).  Density  estimation  and  pattern  classification. 

Problems  of  Control  and  Information  Theory  2  67-80. 

*  ROSENBLATT,  M.(1956).  Remarks  on  some  nonparametric  estimates  of  a  density  function. 


Ann.  Math.  Statist.  27  832-837. 


27 


SCHEFFE,  H . ( 1947 ) .  A  useful  convergence  theorem  for  probability  distributions. 

Ann.  Hath.  Statist.  18  434-458. 

SPIEGELMAN,  C.  and  SACKS,  J.(1980).  Consistent  window  estimation  in  nonparametric 
regression.  Ann.  Statist.  8  240-246. 

STEIN,  E.M.(1970).  Singular  Integrals  And  Differentiability  Properties  Of  Functions. 

Princeton  University  Press,  Princeton,  New  Jersey. 

VAN  RYZIN,  J.(1966).  Bayes  risk  consistency  of  classification  procedures  using  density 
estimation.  Sankhya,  Series  A  28  161-170. 

•  WEGMAN,  E.J.(1972).  Nonparametric  probability  density  estimation:  I.  A  summary  of 

available  methods.  Technometrics  14  533-546. 

•  WERTZ,  W.(1978).  Statistical  Density  Estimation.  A  Survey.  Vandenhoeck  and  Ruprecht, 

Gottingen,  Applied  Statistics  and  Econometrics  Series,  Vol .  13. 

•  WERTZ,  W.  and  SCffNEIDER,  8.(1979).  Statistical  density  estimation  :  a  bibliography. 

Int.  Statist.  Rev.  47  155-175. 

WHEEDEN,  R.L.  and  ZYGMUND,  A. (1977).  Measure  And  Integral.  Marcel  Dekker,  New  York. 

Luc  Devroye 

School  of  Computer  Science 
McGill  University 
805  Sherbrooke  Street  West 
Montreal ,  Canada  H3A  2K6 


I 


ARL-TP-82-31 
2  August  1982 


•Ar 

★  ★★ 


ON  THE  INCONSISTENCY  OF  BAYESIAN  NON-PARAMETRIC  ESTIMATORS 
IN  COMPETING  RISKS/MULTIPLE  DECREMENT  MODELS 


by 


Barry  C.  Arnold* 
Patrick  L.  Brockett** 
Willi  am  Torrez* 

A.  Larry  Wright*** 


Department  of  Statistics,  University  of  California,  Riverside,  Californi 
Department  of  Finance  and  Applied  Research  Laboratories,  The  University 
of  Texas  at  Austin,  Austin,  Texas 

Department  of  Mathematics,  University  of  Arizona,  Tuscon,  Arizona 


This  paper  has  been  submitted  to  IEEE  Transactions 
on  Reliability  Theory  for  publication. 


This  research  supported  in  part  by  the  Office  of  Naval 
Research  under  Contract  N0001 4-81 -K-01 45 . 


§1.  Introduction 


The  problem  of  analyzing  the  competing  risks  of  various  diseases 
has  been  studied  in  biometric,  statistical  and  actuarial  literature  since 
Daniel  Bernoulli  first  (1760)  calculated  the  effect  that  the  elimination 
of  a  specific  disease  (in  this  case  smallpox)  would  have  upon  the  general 
mortality  rate  structure  of  the  population.  This  problem  also  is  of  vital 
importance  in  engineering  reliability  computations  of  complex  systems 
and,  under  the  guise  of  joint  life  status,  in  multiple  decrement  life  tables 
in  actuarial  calculations  (cf.  Jordan  (1975)  Chapter  19). 

The  usual  mathematical  formulation  of  the  problem  is  as  follows.  Given 

k  possible  mutually  exclusive  and  collectively  exhaustive  risk  factors 

(diseases,  components  which  could  cause  system  failure,  etc.),  let  X^, 

i  =  1,  2,...,  k  denote  the  so-called  net  lives,  i.e.,  the  time  which 

would  expire  until  risk  factor  1  operating  alone  would  cause  system  failure 

(death  of  the  individual  or  failure  of  a  mechanical  or  electrical  mechanism). 

The  observable  variable  for  each  member  of  the  population  is  Z  =  ,mln,  X., 

lf.if.k  l 

the  failure  time  of  the  system  (the  actual  time  of  death  of  the  person). 

Often  (but  not  always)  we  are  also  able  to  ascertain  which  of  the  k 
competing  risk  factors  is  responsible  for  the  failure.  Using  whatever 
data  is  available,  our  desire  is  to  estimate  the  joint  probability 
distribution  measure  for  X^,  X^ ,  ...,  X  .  Using  this  estimated  probability 
measure  we  may  consider  interactions  of  the  net  lives,  construct  cause 
specific  insurance  policies,  and  subsequently  set  policy  premiums  for 
insuring  losses,  and  design  backup  systems  with  appropriate  reliability. 

There  are  several  problems  in  implementing  the  above  procedure,  the 
first  and  foremost  of  which  is  that  in  general  the  knowledge  of  the  time 
and  cause  of  death  or  failure  alone  does  not  usually  uniquely  determine  the 


_2- 

joint  measure.  If  we  let  J„  denote  the  cause  of  failure  of  individual 

V- 

and  Z,  =  •“n,  X  .  ( .. )  the  actual  time  of  failure,  then  Tsiatis  (1975)  has 

1  <  L^k  i 

shown  that  the  data  (Z  J  ),  ....  (Z  ,J  )  need  not  uniquely  determine  the 

t  i  mm 

joint  measure  Fq  of  (X^,  , X  )  .  If  the  variables  are  assumed  to  act 

independently,  then  Berman  (1963)  has  shown  that  the  data  uniquely  determines 
Fq .  Within  various  parametric  families,  Nadas  (1971)  has  shown  this  in 
the  case  where  the  X's  are  multivariate  normal,  and  David  and  Moeschberger 
(1978)  discuss  other  joint  models  such  as  multivariate  exponential. 

In  many  situations  the  assumption  of  mutual  independent  of  the  X's 
is  not  valid  (morbidity  and  mortality  from  different  causes  tend  to  be 
related  and  component  failures  may  tend  to  be  related).  When  additional 
auxiliary  information  is  available  concerning  the  underlying  multivariate 
dependence  structure,  it  may  not  be  compatible  with  the  few  uniquely 
identifiable  distributions  mentioned  above.  Moreover,  in  the  case  of 
vague  information  (e.g.,  the  patient  died  but  the  cause  was  not  known, 
and  the  time  is  only  approximate)  traditional  methods  fail  to  be  able  to 
incorporate  this  information.  Additionally,  one  may  not  be  precisely 
certain  of  the  parametric  class  of  distributions  under  study. 

In  such  a  situation  one  is  tempted  to  adopt  a  Bayesian  approach. 

Using  prior  information  one  would  obtain  a  prior  "best  guess"  for  the 
joint  probability  law  of  the  net  lives,  and,  appealing  to  the  traditional 
folklore,  one  would  at  least  expect  the  posterior  Bayesian  estimator  to 
be  a  consistent  estimator  of  the  true  joint  probability  law.  In  this 
paper  we  shall  derive  the  Bayesian  non-parametr ic  estimator  using  mixtures 
of  Dirichlet  processes  (section  2).  In  section  3  we  illustrate  the 
technique  on  a  bivariate  censoring/death  model  with  a  joint  Pareto  law. 

In  section  A  we  show  the  (at  first  s  irprising)  result  that  these  Bayesian 
estimators  are  inconsistent  if  n  >  2  except  in  possibly  serendipitous  cases. 


52.  Dir ichlet  processes,  mixtures  of  Dirichlet  processes  and 


Bayesian  analysis  of  competing  risks. 

In  order  to  do  Bayesian  analysis  of  competing  risks,  we  must  begin 
with  a  method  for  putting  a  prior  distribution  on  the  set  of  all  joint 
probability  measures  on  (  B  )  \  following  Ferguson's  (1973)  lead  we 
shall  define  such  a  prior  distribution  via  the  Dirichlet  distribution. 
Since  a  stochastic  process  is,  by  definition,  a  fumtion  space 
valued  random  variable,  we  call  the  resulting  prior  distribution  a 
Dirichlet  process,  and  define  it  uniquely  by  specifying  its  finite 
dimensional  marginals.  We  proceed  as  follows. 

We  will  say  that  an  r  dimensional  vector  X  =  (X}  ,  .  ,  .  ,  X^)  has  a 

Dirichlet  distribution  with  parameter  a  =  (a^,  a^ ,  . ...a  )  £  TR  if  the 
density  of  (X^,  ...,  X  is  of  the  form 


f(x..:a)  = 


r  (  Z  a  . ) 

i=l  L 
r 

r(a.) 

i=i  1 
0 


a  -i  a.-l 
‘l  X2 


r-1 

a  -1  if  Z  x .  <  1  x  ■■  fl 
r  i  ~  i 


i=l 


(2.1) 


otherwise 


where  x 

r 


r-1 

1  -  2  x. 

i=l  1 


We  may  define  a  Dirichlet  stochastic  process  via  its  finite  dimensional 
marginals  as  follows.  Let  a  be  a  finite  measure  over  (]R  "*”)*,  and  let  A 
a2’  •••’  Am  be  an  arbitrary  partition  of  OR+)k.  The  (random)  measure  F  on 
(IR  )  has  a  Dirichlet  process  distribution  if  the  vector  (F(A^) ,  F(A0), 
F(Am))  has  a  Dirichlet  distribution  with  a  parameter  a  =  (n(A1) ,a(Aj , 
a(Am))  for  every  partition  A^  ...,  Am>  Ferguson  (1973)  showed  that  this 


-4- 


collection  of  finite  dimensional  marginals  uniquely  defines  a  stochastic 

process  D(  0  called  the  Dirichlet  process.  Note  that  the  mean  is  E(D(u))  = 
+  k 

a/a  ((IR  )  )  so  that  we  may  interpret  a  as  our  best  prior  assessment  of  the 

+k 

joint  measure,  and  the  total  mass  _i(IR  )  as  an  indicator  of  our  confidence 
in  this  prior  estimate. 

We  apply  Bayes  Theorem  using  the  prior  Dirichlet  process  distribution 

on  the  parameter  F,  and  the  likelihood  function  of  the  data  to  obtain  a 

posterior  distribution  on  F  which  will  be  a  mixture  of  Dirichlet  processes, 

say  D(2) ,  where  iB  is  a  random  measure.  Our  posterior  estimate  of  the  joint 

probability  measure  will  be  m  =  E(D(3))  =  E[3/z(IR  ')  ]  • 

P 

The  following  theorem  can  be  derived  from  Antoniak  (1974).  It  allows 
an  iterative  incorporation  of  data  observations  as  they  become  available. 


Theorem  1  If  the  prior  distribution  of  the  parameter  F  (the  joint  probability 
distribution  measure)  is  a  mixture  :  D(a^)d h(A)  of  Dirichlet  processes  with 
confidence  coefficient  ^(IR4^)  constant  in  AeA,  an  indexing  set,  then  for 
any  data  of  the  form  (X^,  X0 ,  ...,  X^)  tB,  where  B  is  a  Borel  set  with 
a-^(B)>  0  for  all  A  ,  the  posterior  distribution  of  F  is  also  a  mixture  of 
Dirichlet  processes. 


i  *  (;> + 


(dz ! B) u(dA)  . 


(Note  that  +  I(z)  is  a  random  measure  whose  value  on  a  set 
W  is  ^ (’■-')  +  I(z).) 

As  a  notational  convenience  we  shall  use  the  notation  D(a)  to  denote 


a  Dirichlet  process  with  prior  measure  a  or,  in  the  case  in  which  a  is  a 
random  measure,  to  denote  the  mixture  of  Dirichlet  processes.  For  example. 


-J- 


if  we  bo;,  in  with  non-random  prior  measure  t  and  observe  the  data 

(X  ...  X  )eB  then  the  posterior  measure  has  the  distribution  (by 
L  ’  k  1 

Theorem  1)  ,•  iven  bv  the  mixture 


!H  I  .  tz)>  -Uz‘  B. 

t  *  )  l 


D(  i+I  (  U^) 


u(Wfl  B  ) 

where  Z^  is  a  random  variable  with  probability  measure  Tt ( '•••  |B^)=  ^ ^ ^  "j  1  • 

The  mean  or  posterior  measure  is  the  expected  value  of  the  posterior 
process,  m^,  and  has  the  value  on  the  set  W  of 

m  (W)  =  F.!.i(W)  +  I,,(Z.  )]/(;.  UR 

p  ■'  1 

;(W)  +  .(V.’IB) 

> ( IR  rk)  +  1 

(11  (?) 

Similarly,  the  data  X  rB  ,X  cB,  yields  a  posterior  distribution  process 
which  is  the  mixture  (by  Theorem  1  and  the  above  calculation) 

D(a+I(  )(Z1)+I(.  )  (Z2|Z1>) 


where  has  the  probability  measure  ot ( • j  B^)  as  above,  and  Z1 | -  z  has 
the  probability  measure  (  *  I  B -> )  where  Cx^  =  a  fl ^  (z)  .  Using  the  fact  that 


for  a  set  W 


E(VVZ1» 


«V 


•iCJ|B2)j(l>i]ll  )  +  '1<iXXb.  |  B,)a(  1  (W  |  B,) 

/  i  \  c  '  !  B ^ ) 

+  ./u(w;b -)(X(B,0W  |b.)  -  — 7~ - 

1  aU;(B „) 


-6- 


where  ^ (W)  -  a(W)+l,  it  follows  that  the  posterior  measure 
sat isf ies 


m  (W)  [a(lR+k)+2] 
P 


=  e[o(w)+iw(z1)+iw(z2|z1)] 


=  c^W)+a(Vl|B  )+a(wlB  )a(B2  |b  ) 


+  a(1)(w|B2)a(w  B2|bi)  +  U(1) (w|B2)a(B2  wc|bi) 


a(B2  W  IV 

a(1)(V 


In  the  case  where  B.nB2  =  p  this  reduces  to  the  simple  relation 
m  (W)  =  [a(W)+a(w[B1)+a(w|B,)  ]/[a(IR+k)+2  ]. 

Thus,  analogously  for  data  X^CB^,  X^cB,,,  X^n^eB  with  mutually 

disjoint  Borel  sets  B^,  ...  >  the  posterior  Bayes  estimate  of  the 

joint  probability  measure  is 

n  . 

«_(W)  =  [a(W)  +  E  a(w|B.)]/[a(lR  )+n]  .  (2.2) 

P  i=l  1 


As  will  be  seen  in  section 
naturally  in  the  context  of 


3,  the  case  in  which  B.flB. 

i  j 

competing  risks. 


i>  occurs  quite 


§3.  Analysis  of  survival  time  distribution  with  a  random,  non-independent 
censoring  mechanism. 


A  particularly  important  special  case  of  the  preceding  analysis 
involves  the  situation  of  only  two  factors  —  occurrence  of  the  event  under 
study,  or  censoring  (which  includes  loss  to  follow  up,  dropping  out  of  the 
study,  failure  due  to  causes  other  than  the  event  of  interest,  and  even 
being  alive  at  the  termination  of  the  study  period).  In  general,  censoring 
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is  not  independent  of  failure  due  to  the  cause  of  interest,  and  our 
previous  theoretical  development  may  be  used  to  obtain  an  explicit 
formula  for  the  survival  curve  in  light  oi  the  prior  information  and 
the  dependent  censoring  and  failure  time  data. 

For  illustrative  purposes,  we  shall  assume  the  joint  probability 
measure  for  the  survival  and  censoring  variables  are  related  via  a  bi¬ 
variate  Pareto  law.  Thus,  the  joint  measure  of  the  set  [x  ,•'■)  x[x.;,™) 

=  W  is  of  the  form 

ct(W)  =  K[l+  (Xl/aL)  +  (x2/o,)]“Y,  >  0,  x.  >  0,  Y  >  0.  (3.1) 

+2 

The  constant  K  =  a(lR  ),  as  before,  has  a  "prior  sample  size"  inter¬ 
pretation,  or  confidence  interpretation.  The  constants  and  0o  are 
scale  parameters,  and  the  parameter  Y  is  interpreted  as  an  association 
parameter.  The  bivariate  Pareto  law  is  a  particularly  beautiful  model 
which  assumes  algebraically  decreasing  survival  in  each  variable 
separately.  If  Y  >  ?  we  may  calculate  (see,  e.g.,  Arnold  (1982)  Chapter  6) 

e(x.)  =  yat  / (Y-l) 

Var(Xi)  =  Ya.2/[(Y-l)2(Y-2)] 

Cov(X1,X2)  =  °1a2  / ((Y-l  )~(Y-2)  ]  (3.2) 

Corr(X1,X2)  =  y"1  . 

Additionally,  the  bivariate  Pareto  has  the  desirable  property  that  the 
marginal  and  conditional  distributions  of  X^  and  X,,  are  univariate  Pareto. 

Let  us  now  proceed  to  derive  the  posterior  measure  m^  for  the  joint 
survival  censoring  problem.  The  marginal  posterior  survival  curve  for 
the  variable  X^  alone  is  then  found  by  using  the  formulae 
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S(x)  =  m  (X  >  x)/m  (K+2) 
pi-  p 


(3.3) 


Returning  for  a  moment  to  Theorem  1,  we  first  note  that  if  the  data 

we  obtain  is  of  the  form  (X^  >  t,  X0  =  t)  or  (X^  =  t ,  X?  >  1 1  ,  then 

ct(B)  =  0,  so  a  slight  modification  is  necessary.  This  is  easily  accomplished 

by  taking  (X^,X2)e  B  where  B^  =  {X^  >  t,  |  X^  -  t|  _<  d  or  (X^  >  t ,  |  —  1 1  £ } 

respectively,  and  then  taking  the  limit  as  e^O.  The  resulting  equations 

for  m  now  become 
P 

n  - 

it  (W)  =  [a(W)  +  i  o(W  |  B .)  ]  /  [a(K  )+n]  (3.4) 

P  4-1  1 


where  u(w|b^)  is  to  be  interpreted  as  the  conditional  a  measure  given  the 
line  segment  X^  >  t,  X,  =  t  or  X,  >  t,  X  =  t  respectively,  i.e., 


ct(wj Bi) 


lim 

£—0 


a(WPB  ) 
~  a(Bj 


For  the  bivariate  Pareto  law  (3.1)  we  have  the  conditional  survival 
law  of  X^  given  X2  =  x2  univariate  Pareto  with  scale  parameter  o.[l+x0/a9] 
and  exponent  y  +  1,  i.e.. 


>_  x1|  X2=x2] 


(3.5) 


If  an  individual  is  censored  at  time  t,  the  information  furnished  is 

X:  >  t,  X2  =  t,  while  if  an  individual  fails  the  information  furnished  is 

of  the  form  X^  =  t ,  X7  >  t.  Using  (3.5),  we  find  the  conditional  probability 

cUxL  x|x  >  t,  x2  =  t)  = 

n^+ta^+to, 
a^a2+to^+xoT 


uP[X1  >  x|xx  -•  t,  X2  =  t) 


Y+l 

,  X  >  L 
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We  are  now  ready  to  specify  the  conditional  survival  function  for 
given  censored  and  failure  time  data. 


Theorem  2  Consider  the  failure  time-censoring  model  with  bivariate 

prior  random  measure  which  follows  a  Dirichlet  process  with  prior  expecta¬ 
nt) 

tion  measure  - tj-  =  P.t(‘)  given  hy  the  Pareto  law  (3.1).  Let  f  ,f?, 

a(TR  )  1  ~ 

f  denote  the  observed  failure  times  and  c,,  c„  ,  ....  c  the  observed 
r  1  2  m 

censoring  times  for  a  sample  of  size  r+m  individuals,  each  of  whom  either 
fails,  or  is  censored  (but  not  both).  Then  the  posterior  random  measure  is  a 
mixture  of  Dirichlet  processes,  and  the  posterior  survival  function  for  X 
given  the  data  and  prior  information  is 

S  (x)  =  P(X  >  x]  =  m  ([.x,°°)x  ]R+) /m  .  +2. 

1  P  /  pOR  ) 

+  m 

n([x,°°)x  m  )  +  N  (x)  +  r  P  tx,  >x|  X  >0  .  ,x.,=c  .  ] 

_  _ _ I _ 1-1  0.  1 _ 1  1  J.  l 

(  -tr*  v 

a(TR  )  +  r  +  m 


ci(IR+2)  {l+x/c'1  ]  {  +  Nf(x)  +  1 

o  a.+c  .  n  +c  ,rt 

12  i  1  i  2 

f+1 

+  N  (x) 

c  ,<X  j 
1  1 

!  aiVci°rx  °2 

c 

+2 

a(IR  )  +  r  +  m 

where  N  (x)  is  the  number  of  failures  which  occur  after  time  x  and  N  (x)  is 
1  c 

the  number  of  individuals  censored  after  time  x.  The  proof  of  Theorem  2 
follows  directly  from  Theorem  1,  and  from  calculations  involving  the  marginal 
and  conditional  distributions  of  multivariate  Pareto  laws. 

It  should  be  noted  that  vague  end  point  information  such  as  "death  at 
time  t,  but  cause  of  death  unknown"  can  be  written  as  (X  X,)..B  where 
B  -  {X1=t,X2?t.»UlX1M  ,  X2=tl.  Similarly  if  the  exact  time  of  death  is  only 
known  to  be  ^  _<  Xj.  <t2,  we  may  incorporate  this  knowledge  since  it  is  also 
of  the  form  (X^JcB  where  B  =  it^X,^,  X^t,}.  More  exotic  forms  of 
information  can  also  be  incorporated  by  the  same  method. 
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§4.  Inconsistency 

It  is  tempting  to  assert  that,  as  more  and  more  d\ta  is  collected, 
the  influence  of  the  assumed  prior  (determined  hy  a)  on  the  estimated  value 
of  F  should  diminish.  Indeed,  this  is  true  for  measures  on  IR  (cf.  Susarla 
and  Van  Ryzin  (1976)  or  Blum  and  Susarla  (1977)  or  liha  t  tacharya  (1981), 
Unfortunately,  in  the  competing  risks  setting,  such  is  rarely  the  case.  We 
can  illustrate  this  distressing  situation  in  the  context  of  our  failure  time- 
censoring  model.  Consider  the  posterior  estimated  survival  function  in 
Theorem  2.  The  expression 

S(x)  =  P  [X  >  x]  =  ryi([x,<°)  xlR+2  /  m  (]R  +^\ 

1  V  /  P 


a(  lx,00)  x  IR  )  +  N  (x)  +  I  P  [X.  >  x|  X,  >  c  .  ,X-,  =  c.] 
f  a  1  1  1  l’  2  l 

a(IR+2)  +  r  +  m 


is  valid  for  a  general  prior  measure  c>.  It  is  not  necessary  that  it  cor¬ 
respond  to  a  Pareto  law.  Imagine  that  we  have  available  n  observations 
(a  random  number  r^  are  failure  times  and  the  remaining  random  number 
mn  are  censoring  times) .  The  strong  law  of  large  numbers  applies 
and  S(x)  converges  almost  surely.  However,  after  some  calculation  we  find 
that  the  limit  will  coincide  with  the  true  marginal  distribution  F^  if  and 
only  if  true  joint  distribution  and  the  prior  guess  a  are  related  by 


iX  F(X  >  vjx 

0  1  * 


v) 


P  (x7 

a  1 


Vxi 


v) 

V) 


d  F2(v) 


F(Xl  >  x|x2  =  V)  d  F2(v),  all  X. 


(4.1) 
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A  sufficient  condition  for  this  is  that 
a  =  c  F 


(4.2) 


A  weaker  sufficient  condition  is  that 

.ill 

Pa(Xl  >  XIX2  =  v)  !'(Xl  "  xiXn  = 

Pa(Xl  >  vlx2  =  v)  "  >  v  i  X0  =  v)’ 


(4.3) 


We  cannot  expect  to  guess  right,  i.e.,  we  cannot  expect  (4.2)  to  hold 
We  can  interpret  (4.3)  as  the  requirement  that  F  and  a  assign  proportional 
densities  to  half-lines  of  the  form  {(x,v):  x>v)  where  v  is  a  fixed  positive 
number.  Again  it  would  be  excessive  serendipity  if  this  were  to  occur. 
Generally  speaking  tho  effect  of  a  cannot  be  reasonably  expected  to  "wash  out" 
with  increasing  sample  size. 

The  conclusion  is  quite  logical  in  retrospect.  It  merely  formalizes 
the  fact  that  if  our  data  is  to  give  us  no  information  about  the  conditional 
density  on  lines  of  the  form  {x  >  v }  then  we  will  have  no  reasonable  basis 
to  change  our  prior  beliefs  about  such  densities.  The  implication  of  the 
preceding  inconsistency  result  is  that  parametric  analysis  (either 

Bayesian  or  non  Bayesian)  is  mandated  for  the  competing  risk  problem  in 
system  reliability.  Much  as  we  might  like  to  proceed  non-parametricallv , 
even  Bayesian  analysis  cannot  give  improved  estimation  results. 
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Abstract 


Knowledge  of  the  8  percentile  residual  life  function  does  not  uniquely 
determine  the  underlying  distribution.  Additional  information  about  the  tail 
behavior  of  the  distribution  is  needed.  Knowledge  of  both  the  8^  and  8^  percentile 
residual  life  functions  does  uniquely  determine  the  distribution  provided  that 
8^  and  82  are  non-commensurate.  This  corrects  and  extends  the  results  of 
Schmittlein  and  Morrison  (1981). 


r 


In  reliability  studies  one  often  encounters  situations  in  which  analysis 
must  be  made  concerning  the  lifetime  of  an  entity  or  produced  product.  If  the 
analysis  must  take  place  when  the  object  under  study  has  already  aged  t  time 
units,  then  the  problems  regarding  the  non-f ailed  units  translate  into  inferences 
concerning  the  residual  life  at  time  t.  Traditionally,  this  inference  has 
involved  the  mean  residual  lifetime  function  E(T-t|T>_t),  where  T  is  the  random 
variable  denoting  the  total  life  of  the  object,  and  T-t  is  the  residual  life  at 
time  t.  By  a  differential  equation  argument  it  is  easily  shown  that  the  mean 

residual  life  uniquely  determines  the  distribution  of  T.  Thus,  a  linear  mean 

residual  lifetime  uniquely  characterizes  the  Pareto  (II)  distribution,  a  fact 
apparently  first  discovered  by  d'Addario  (1939),  and  rediscovered  periodically 
in  the  literature. 

For  some  problems,  it  is  convenient  to  work  with  the  median  or  some  other 

percentile  of  the  residual  life  rather  than  the  mean  residual  life  .  With 

the  heavy  tailed  distributions  commonly  encountered  in  survival  studies,  a  single 
long  term  survivor  can  have  a  marked  effect  upon  the  mean.  Additionally,  to 
calculate  the  mean  life  we  need  wait  until  every  unit  has  failed,  while  the 
median  residual  life  may  be  calculated  as  soon  as  the  majority  of  the  units  have 
failed.  For  the  percentile  of  the  residual  life  function  we  only  need  to 
wait  until  6  percent  of  the  units  have  failed. 

In  this  note  we  shall  show  that  in  general  the  8*”^  percentile  residual 
does  not  uniquely  determine  T.  We  shall,  however,  give  conditions  sufficient 
to  ensure  uniqueness  does  prevail.  In  particular,  we  shall  give  conditions  which 
ensure  that  a  linear  median  residual  lifetime  function  is  uniquely  a  Pareto  (II) 
property.  This  will  clarify  the  (incorrect)  theorem  in  Schmittlein  and  Morrison 
(1981) where  It  is  stated  that  the  median  residual  life  function  is  linear  if  and 
only  if  the  distribution  is  Pareto  (II). 

Non-uniqueness  of  the  percentile  residual  lifetime  function 

The  cumulative  distribution  function  of  the  random  variable  T  will  be 
denoted  by  F,  the  survival  or  reliability  function  for  T  will  be  denoted  by 
F(t)  =  P[T>t],  and  the  8^  percentile  of  the  residual  life  at  time  t  will  be 


denoted  by  mQ(t).  Throughout  we  shall  assume  F  is  continuous,  strictly  increasing 

D 

and  supported  on  (0,°°)  to  ensure  that  m^(t)  is  well  defined  for  each  8  and  t. 


The  relationship  between  F  and  mQ(t)  is 

p 

(1)  F(mg(t)+t)  =  (l-B)F(t) 

which  is  a  special  case  of  Schroder's  functional  equation  (4>  ( t ) )  =  6^(t).  This 
arises  frequently  in  the  theory  of  branching  processes  (see  Kuczma  (1968)  Chapter 
VI,  or  Seneta  (1968)  for  details  regarding  this  equation). 

Schmittlein  and  Morrison  (1981)  consider  the  equation  (1)  with  m,  (t)  =a  +  bt 
and  give  a  theorem  stating  that  m,  is  linear  if  and  only  if  T  has  a  Pareto  (II) 
distribution.  This  theorem  is  false.  Since  equivalents  of  the  functional  equation 

(1)  arise  so  often  in  character  .ation  theorems,  it  is  worthy  of  detailed  study. 

In  considering  the  uniqueness  properties  of  the  B^  percentile  residual 
life  function,  we  note  that  two  reliability  functions  F^  and  F^  have  residual 

function  m  (t)  if  and  only  if 

p 

(2)  m  (t)  =  F^1((l-3)F1(t))  -  t  =  F21(d-B)F2(t))-t, 
or  equivalently 

(3)  F2(F"1((l-3)F1(t)))  =  (l-$)F2(t). 

Thus- uniqueness  of  the  solution  of  (1)  for  a  given  residual  life  function 
mg(t)  is  equivalent  to  uniqueness  in  the  functional  equation  (3)  for  F2  when  F^ 
is  given.  Now  letting  G(x)  =  F2(F^(x)),  and  F^tt)  =  y  in  (3)  yields  the 
equivalent  function  equation 

(4)  G((l— 3)y)  =  (l-B)G(y)  0  <  y  <  1. 

Equation  (4)  arises  frequently  in  characterizations  of  the  exponential  distribu¬ 
tion  (cf.  Arnold  (1971),  Gupta  (1973),  Galambos  and  Kotz  (1978)). 

Next  we  note  that  given  any  monotone  increasing  continuous  solution  G  to 

(4)  with  lim  G(y)  =  0,  lim  G(y)  =  1  we  obtain  a  solution  F0  to  (3)  by  letting 
y-K)  y+1  1 

F2(t)  =  G(F^(t)),  and  this  ?2  also  has  m  (t)  as  its  B-percentile  residual  life 
function.  F2  is  distinct  from  F^  provided  G(x)  4  x.  Arnold  (1971)  and  J.  S.  Huang 
(1974)  have  noted  that  such  solutions  G(x)  4  x  can  easily  be  contructed,  even 
under  the  additional  assumption  that  G  be  differentiable  on  (0,1).  Graphically, 


a  solution  to  (4)  can  be  constructed  by  considering  an  arbitrary  monotone  function 
h(y)  for  (1-0)  <  y  <  1  which  satisfies  h(l-6)  =  1-8,  h(l)  =  1, 


and  then  extending  h  via  the  functional  equation  (4)  to  successive  subintervals 
k  k-1 

(1-0)  <  x  <  (1-g)  .  Clearly  then,  there  are  a  multitude  of  such  solutions. 

An  example  is  sketched  in  Figure  1.  Galambos  and  Kotz  (1978)  give  the  equation 
for  one  such  solution. 


(t  o-.i')  >  ■=  ( ' 

Figure  1  Construction  ot  a  nontinear  solution  to  (4) 

Let  G  denote  the  class  of  all  continuous  increasing  function  G  with 
lim  G(x)  =  0,  lim  G(x)  =  1,  with  G  satisfying  (4).  For  each  distribution  F^ , 

X-H)  x-*-l 

the  entire  class  F^  =  {F2 :  F^  =  G(F^) ,GeG}  has  the  same  8^  percentile  residual 

life  function  mQ(t).  In  particular,  unlike  the  situation  with  the  mean  residual 
p 

life  function,  the  median  residual  (or  more  generally  the  0-th  percentile) 
lifetime  function  does  not  uniquely  determine  the  distribution!  Thus,  the 
stated  theorem  of  Schmittlein  and  Morrison  (1981)  is  incorrect,  and  the  graphi 
cal  method  of  Figure  1  can  be  used  to  construct  many  distributions  other  than 
the  Pareto  (II)  distribution  which  possess  linear  median  residual  lifetime 
functions. 

As  a  positive  result  we  have  the  following  theorem. 

Theorem  1.  Given  two  distributions  F^  and  F^ 


(a)  ^  for  some  6,  ^  and  F£  both  have  Bth  percentile  residual  life  functic 


r,<(t)  ,  and  also  lin 


F„(x) 


x-*«  F  U) 


1,  then  F1=  Y ^ 


(b)  If  mg  (t)  and  (t)  are  the  8^  percentile  residual  life  functions  of 

both  and  F^  for  any  two  noncommensurate  percentiles  8^  and  821  then  F  =  F  . 

Proof.  If  F.  is  given,  then  the  set  of  reliability  functions  ^  which  solve  (.3,1 

is  the  set  F  described  previously.  Conversely,  given  we  may  construct  a 

member  GeG  via  G(x)  =  F,(F^  ^(x)).  Thus  uniqueness  is  equivalent  to  giving 

conditions  on  F^  and  f.,  which  insure  G(x)  =  F^(F^  ^(x))eG  is  the  identity.  A 

sufficient  condition  on  G  to  insure  this  is  lin  G(x)/x  =  1  (lor  a  proof  see 

x-*-0 

Arnold  (1971)  or  Gupta  (1973)  ,  however  the  plausibility  of  the  result  is  immediate 

-  -  -1  ?2U) 
from  figure  1).  For  G(x)  =  F2(F^  (x))  this  condition  is  equivalent  to  lim  -  =  1. 

x-*»  1  ( x ) 

To  prove  (b).  suppose  that  and  have  identical  residual  functions 
m^  (t),  i=l,2.  Then  G(x)  =  FjCFj  (x) )  satisfies  (A)  for  8^  ancl  ^  two 

non-commensurate  values  of  6.  Iterating  (A)  we  find 

I  (1-6  )k  \  (1-8  )k 

Gl  - «■  x)  = - -r  G(x)  for  all  £  and  k. 

\  (1-8.,)  '  J  (l-82) 

Since  the  collection  of  ratios  above  is  dense  in  [0,1]  it  follows  that  G(x)=x  for 
xe[0,l],  and  hence  =  F^. 

Theorem  2.  Let  mQ(t)  be  given 

^  -  F  (x) 

a)  For  any  fixed  function  g,  let  F  =  { F :  lim  — =  cl.  Then  there  exists 

8  x^cc  S(x) 

at  most  one  reliability  function  FcF  with  Btkl  percentile  residual  life  function 

g 

m  (t) . 

P  a_ 

b)  Let  F  =  (F:  lim  x  F(x)  =  c  4  0}.  Then  there  exists  at  must  one  FrF 

Ot  Ci 

X~*<Xs> 

with  m„(t)  as  its  8tl'  percentile  residual  life  function. 

p 

Proof .  Part  (b)  is  just  a  particular  case  of  (a)  with  g(x)  =  x  U,  however  it  is 
stated  sf^araiely  because  of  its  intimate  connection  witli  existence  of  moments. 

Namely,  the  class  F  is  only  slightly  larger  than  the  class  of  all  oi stribufions 


-5- 


with  finite  moments  only  up  to  a.  It  also  contains  the  class  implicitly  assumed 
by  Schmittlein  and  Morrison  (1981)  in  their  study  of  linear  median  residual  life 
functions.  The  lacuna  in  their  paper  is  between  equations  (10)  and  (11)  in  which 
they  restrict  their  attention  to  distribution  functions  in  a  subset  of  F 

a 

By  (b)  there  is  uniqueness  here. 

For  the  proof  of  (a),  assume  F^.F^e  F^  are  two  solutions  to  (2).  Then  G(x)  = 
F2(F1"1(x))eG 


F2(y)  F?(y)/g(y) 

solves  (4),  and  lira  G(x)/x  =  lira  - -  =  lira  ~ -  =  1.  By  the  argument 

x-K)  y^o  Fx(y)  F1(y)/g(y) 

used  in  the  proof  of  part  (a)  of  Theorem  1,  it  follows  that  F2  =  F  ;  i.e.,  there 
is  at  most  one  member  of  Fg  with  m^(t)  as  its  residual  life  function. 


To  end  on  a  happier  note,  we  mention  that  the  results  of  this  paper  give  condi¬ 
tions  under  which  the  general  residual  lifetime  function  determines  the  distri¬ 
bution,  and  are  not  restricted  to  the  linear  median  residual  life  and  Pareto  law. 

In  particular  the  results  apply  to  all  the  residual  life  functions  outlined  in 
Table  2  of  Schmittlein  and  Morrison. 
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1.  Introduction 


In  this  paper  we  consider  properties  of  the  stochastic  process  Y(t)  = 

E  f(t,s,Js)  where  Js  =  X(s)  -  X(s-O)  is  the  (random)  size  of  the  jump, 

s  is  the  (random)  time  of  the  jump  of  an  additive  stochastically  continuous 

process  X(s)  with  no  Gaussian  component,  and  f  is  an  appropriate  function. 

In  the  case  f(t,s,x)  =  xl,  ,  (where  I(  denotes  the  indicator  function  of 

l  s  £  t  J  A 

the  set  A),  we  recapture  the  driving  process  X(t).  The  case  f(t,s,x)  =  g(x) 
yields  a  variational  sum  for  the  process  with  independent  increments.  Such 
variational  sums  have  been  studied  extensively  in  the  literature  under  a 
variety  of  conditions  on  g  and  X(s)  (e.g.,  [1],  [2],  [4],  [5],  (6),  [7],  [8], 

[9],  [10],  [11],  [12],  [13],  [14],  [15],  [17],  [23],  [-4]).  Analysis  of  Y 
yields  information  concerning  the  sample  path  properties  of  the  processes  X(t). 

The  choice  f(t,s,x)  =  h(t,s)x  yields  the  representation  Y(t)  =  /  h(t,s)dX(s). 
Such  processes  are  called  linear  processes  by  Eastwood  and  Lugannani  (1977) 
and  are  useful  in  signal  detection  models.  Another  important  applied  motiva¬ 
tion  for  considering  general  variational  sum  Y(t)  outlined  above  is  that  the 
likelihood  ratio  of  certain  jump  processes  is  of  this  form.  Specifically,  if 
X^(t)  and  X9(t)  are  two  stochastically  continuous  processes  with  independent 
increments  and  "time-jump"  measures  M  and  N  respectively  (see  section  2  for 
the  definitions)  then  Brockett,  Hudson  and  Tucker  (1978)  show  that  the  likeli¬ 
hood  ratio  involves  the  variational  sum 


CO  dM 

nii  (E  {In  —  (s,Js) 


0  <  s  <  T, 


Js  <;  I  [ 
n 


/(f  -  DdN) 
n 


dM 

which  is  of  the  above  form  with  f(t,s,x)  =  In  —  (s,x).  Thus  if  a  Neyman-Pearson 
detector  is  to  be  implemented,  such  variational  sums  need  to  be  investigated. 

An  additional  application  of  such  generalized  linear  processes  is  rhe  ex¬ 
tension  of  Middleton's  model  for  reverberation  in  underwater  acoustical  environ¬ 
ments  to  include  random  amplitude  components  (cf.  Middleton  1967a, b,  1972a, b). 

The  derived  model  is  precisely  of  this  form. 
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Approximations  to  the  variational  sums  using  observed  data  (e.g.,  in  order 
to  implement  the  likelihood  ratio  detector  for  additive  processes)  necessitates 
investigation  of  the  approximating  sums 

Vfn.f.X)  =  L  f ( t , s . , AX . ) 
c  J  J  1 

where  n  =  {s^  <  s.  <  s„...<  s  f  is  a  partition  of  [0,T]  and  AX.  =  X(s  )  -  X(s.) 

0  1  2  n  j  J+1  ] 

is  the  increment  of  the  process  over  the  j*"*1  subinterval.  We  shall  investigate 
when  V  (II,f,x)  converges  a.s.  to  Y(t)  =  Z  f(t,s,Js)  as  the  norm  of  the  mesh 
I |n| I  goes  to  zero.  This  will  also  generalize  the  theorem  in  Riedel  (1980) 
concerning  existence  of  the  stochastic  integral  as  a  limit  of  Stieltjes  sums. 

Most  previous  investigators  of  variational  sums  considered  functions  of 
the  form  f(t,s,x)  =  |x|P  and  processes  X(s)  having  stationary  independent 
increments.  Additionally  most  assumed  a  nested  sequence  of  partitions  FI  -*■  0. 

Our  results  generalize  their  results  in  that: 

(i)  our  functions  f  allow  dependence  on  the  time  as  well  as  the  size 
of  the  jump  (this  is  necessary  for  the  likelihood  ratio  problem 
in  the  non-stationary  case), 

(ii)  we  consider  non-stationary  driving  processes  X(s), 

(iii)  we  consider  limits  along  a  net  of  partitions  (Fl}  0  rather  than 

nested  sequences  of  partitions. 

In  section  2  we  consider  existence  properties  of  the  variational 
sums  Z  f(t,s,Js)  in  terms  of  f  and  the  time-jump  measure  of  the  driving 
process  X,  and  in  section  3  we  consider  limits  for  the  approximating 


sums  E  f (t , s . , AX . ) . 

n  J  J 
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2.  Variational  Path  Properties 


Throughout  this  paper  we  shall  let  X(t)  be  a  separable  stochastically 
continuous  stochastic  process  with  (not  necessarily  stationary)  independent 
increments  over  [0,T],  and  without  a  Gaussian  component.  Without  loss  of 
generality  we  can  and  do  assume  X(*,w)  £  D[0,T],  i.e.,  the  sample  functions 
of  X  are  right  continuous  with  left  limits  at  each  point.  We  also  assume 
X(0)  =  0.  Then  X(t)  is  infinitely  divisible  with  characteristic  function 

.  x  (u)  =  explia(t)u  +  (e  -  1  -  v)dH£  (x)  ] 

'  '  ;  C-00,00)  x 


The  parameters  a(t)  and  M  determine  the  distribution  of  X(t). 

The  time-jump  measure  of  the  process  X  is  defined  over  fO,T]  x  R  by 

M((t  ,t  ]  x  A)  =  M  (A)  -  M  (A).  It  is  known  that  M  ((  f  ,t,]  x  a)  is  the 
1  1  c2  C1  1  " 

expected  number  of  jumps  of  X(s)  for  s  £  ( t ^ , t ^ )  with  jump  sizes  Js  =  X(s)  - 
X(s-O)  £  A. 

Our  first  results  concern  existence  of  Y(t),  i.e.,  properties  of  f  and  M 
will  yield  convergence  of  the  sum  defining  Y(t).  Theorem  1  below  improves 

.  .  V 

results  due  to  Blumenthal  and  Cetoor  (1961) in  the  case  f(t,s,x)  =  |x|  ,  and 
Fristedt  (1967)  in  the  case  f(t,s,x)  =  h(x)  where  h  is  strictly  increasing 
non-negative  with  h  ^  convex.  As  a  particular  case  it  gives  necessary  and 
sufficient  conditions  for  existence  of  a  linear  process  much  weaker  than  those 
in  Lugannani  and  Thomas  (1967)  . 

Theorem  1  Let  f  be  a  measurable  real  valued  function. 


Then: 


or 


a) 


S  £ 

S  £ 


'-[O.T]  I  f  (t’s»Js)  I 


<  oo 


=  oo 


a .  s . 


a .  s . 


which 


tA. 
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b)  E  .  |f  (t,s,Js)  |  <  «>  a.s. 

•sc  l  o ,  r  j 

if  and  only  if 

/  { I f (t,s,x) I  A  U  M(ds,dx)  <  ® 

[0,T]  <  [-1,1] 

(here  f  A  1  denotes  min  {f,l;>. 

Proof  Without  loss  of  generality,  we  may  assume  f  >_  0.  Moreover,  since  there 

are  only  a  finite  number  of  jumps  of  absolute  magnitude  exceeding  one,  it 

follows  that  E  f(t,s,Js)  <  »  if  and  onlv  if  Z  ff(t,s,Js):  |.Js|  <  1}  < 
s  s 

Also,  if  we  take  B  =  {(s,x):  f(t,s,x)  >  l},  then  /.  { f  ( t ,  s ,  Js)  :  (s,Js)  t  B  ;  <  00 
o  u 

if  and  only  if  M(B  )  <  Thus,  without  loss  we  assume  0  <  f(t,s,x)  <  1,  and 
o  ~ 

| Js |  <  1  a.s.  for  the  remainder  of  the  proof. 

Now,  for  k  =  1,2,...,  let  B.  =  t(s,x):  1  '  '  f(t,s,x)  <2  ;.  Hie 

K 

random  variables  N,  ~  E  {Js:  (s,Js)  c  B,  }  are  independent  Poisson  variables 
k  x 

with  expectation  M(B(  ) . 

The  proof  of  a)  follows  from  the  Kolmogorov  zero  one  law  by  writing 
E  f(t,s,Js)  =  kfx(E  {f (t,s,Js) :  (s,Js)  t-  BR})  as  a  sum  of  independent  random 
variables.  To  prove  b) ,  we  first  observe  that  E(E  f(t,s,Js))  =  f  i (t,s,x)M(ds,dx) 
so  that  convergence  of  the  integral  is  certainly  sufficient  for  E  f(t,s,Js)  <  00  a.s 
On  the  other  hand,  suppose  .  f ( t , s ,x)M(ds ,dx)  =  co.  We  then  compute 

E[exp(-E  f(t,s,Js))]  £  E I  exp  (-|_£  ^  2  kNR)  ]  =  RA^  E[exp(-2  NR)  I 
=  I]  exp{M(B,  )  [exp(-2  k)  -  1 1  i  =  exp;  A  [exp  (-2  k)  -  L! 

However,  e  X  -  1  -  -x  as  x  *•  0  so  that  ..  (exp(-2  )  -  I  I  M ( )  >  —  *’  if  and 
a  _k  ^  >i  -k+i 

only  if  |  2  M(B  )  <  '•«>.  Since  “>  =  /  f  (t,s,x)M(ds  ,dx)  _  RE  j  2  M(TiR)  it 

follows  that  E [ exp (-E  f(t,s,Js))]=  0,  i.e.,  "  f(t,s,Js)  =  °°  a.s. 

s  sclO,T] 


3.  Variational  Sum  Approximation 
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In  this  section  we  consider  the  approximation  of  Y(t)  by  variational  sums 

V  (II,f,X)  =  l  f  ( t ,  s .  ,  AX . )  ,  where  II  =  (sn  <  s.  <...<s  )  is  a  partition  of  [0,T] 
t  j  j  j  ui  n 

and  AX.  =  X(s.,,)  -  X(s.)  is  the  observed  increment  of  the  process  X  over  the 
J  J+l  J 

partition  interval  (s  ,s.  Consideration  of  the  limiting  behavior  of  V  (n,f,X) 

J  J+l  c 

as  |  J II  |  |  —  0  is  natural  in  light  of  the  likelihood  ratio  representation  for 
additive  process,  and  also  the  representation  of  linear  processes  as  stochastic 
integrals.  In  general,  we  would  like  to  know  in  what  manner  the  Stieltjes  sums 
converge  to  a  stochastic  integral.  Throughout  we  assume  that  for  each  t,  f(t,  •  ,•) 
is  continuous  on  [0,T]XR. 


The  "strong  variation"  Wt(f,X)  is  defined  by  Wt(f,X)  =  sup  Vt(II,f,X) 
where  the  supremun  is  over  all  partitions  of  [0,T],  We  shall  also  consider 
limits  of  V  (n,f,X)  as  ||ll||  =  max(s.J_1  -  s.)  0.  In  general  lim  U(II) 

J  J  ||n||-o 

denotes  the  limit  along  a  net  directed  by  partitions  partially  ordered  by  mesh. 
Thus  lim  U (IT )  exists  if  and  only  if  lim  sup  U(JI)  =  lim  inf  U (IT) 

1 1 n |  |-o  64-o  1 1 n 1 1<6  64-o  1 1 n ] 

and  both  are  finite. 

We  shall  refer  to  the  following  consequence  of  the  sample  path  properties 
of  additive  process  as  the  truncation  principle.  It  has  been  used  by  many 
authors,  but  it  is  convenient  to  formulate  it  explicitly  here. 

Truncation  Principle  Let  f(t,s,x)  be  a  Borel  measurable  function  which  is 
right  continuous  in  s,  and  continuous  a.e.  [ )  as  a  function  of  x.  If  +  e 
are  not  atoms  of  M_,  then 


lim  E 

! I n | |-o  j 


f(t,sjfAX  )Ij|  |>£]  _  y  f(t,s,Js)I 

J  ^ 


I  Js I >e] 


The  following  lemma  will  be  useful  for  analysis  of  limiting  properties 
of  Vt(n,f,X).  A  version  was  noted  in  a  remark  of  Fristedt  and  Pruitt  (1972)  p.65. 
Lemma  1  (Extended  Kolmogorov  zero-one  principle)  Let  (X^(t) ,  t  c  I ,  k  >  1 )  be 
a  sequence  of  independent  stochastic  processes.  Let  1 Y  ( t ) ,  t  t ,  k  n  ; 
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QO 

and  F  =*  F  .  Then  A  e  F  implies  P(A)  is  zero  or  one. 

Proof  Let  £  >  0  be  arbitrary,  and  suppose  A  £  F.  Since  F^  is  generated 

by  the  algebra  consisting  of  finite  disjoint  unions  of  sets  of  the  form 

S  [Y,  (t  )  £  A  ],  k  ^  n,  it  follows  that  there  exists  a  set  of  the  above 
j»l  Kj  J  j  J 

form  with  P(A  A  B)  <  £.  The  remainder  of  the  proof  follows  by  standard 
arguments  (see,  e.g.,  Breiman  (1968)  page  40). 

We  now  apply  this  lemma  to  the  strong  variation  process. 

Theorem  2  |wt(f,X)  <  <=°  a.s.  or  |  tf ,  X)  |  =  ■»  a.s. 


Proof  Let  (t)  be  the  process  derived  from  X(t)  by  deleting  all  jumps  of 

magnitude  greater  than  e,  i.e.,  X  (t)  =  X(t)  -  E  {js:  jjs|  >  e},  and  let 

{6^}  be  a  sequence  with  6^  -*■  and +6^  not  atoms  of  M^.  From  the  sample 

(6,) 

path  properties  of  X  it  follows  that  | Wt (f  ,X)  |  <  00  iff  |wt(f,X  )  |  <  °°  for 
every  k. 


Let 


*k(t) 


r(<W 


(6.) 

(t)  -  x  K  (t)  =  I  {Js: 


5,  <  Js  <  6 
k  —  1 

(6  ) 


k~l 


The 


processes  X  are  independent,  and  the  random  part  of  X  is  a  function  of 

(V 

{*k(t)  *  e  k  n)  and  hence  X  is  measurable  with  respect  to 


Fn  =  a{Xk(t)’  c  e  [ 0 »T]  »  k  >  n}. 


Thus 


<v 


,  for  each  n  [  |  Wfc(f  ,X)  |  <  “]  =  [  |Wf  (f  ,X  )  | «»] 


_  F^,  so  the  above  event  is  a  tail  event,  which  by  lemma  1  must  have 

probability  zero  or  one. 

We  may  also  apply  the  zero-one  principle  to  obtain  a  characterization 
of  convergence  of  variational  sums.  The  following  result  extends  the 
convergence-divergence  dichotomy  observed  by  several  authors  for  functions 
f(t,s,x)  *  |r.|P  to  the  more  general  framework. 

Theorem  3  For  anv  te[0.T] 

P[  lim  V  (n,f,X)  exists]  is  zero  or  one.  Moreover,  if  lim  V  (TI,f,X) 

I  In! |-o  1 1 n 1 1 -o  c 
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exists,  and  /  | f (t ,s ,x) | M(ds ,dx)  <  00  (so  that  the  limit  is  finite). 

[0,T]x[-l,  1] 

then  the  limit  is  of  the  form  lim  V  (n,f,X)  =  C  +  Y(t)  where  Y(t)  =  T.  f(t,s,Js), 

! I n | l-o 

and  C  is  a  constant. 

Proof  Choose  +6  not  to  be  an  atom  of  M^,.  A  truncation  argument  shows  that 
lim  V  (n,f,X)  =  Z  {£(t ,s,Js) :  Ijsl  >  6}  +  lim  V. (n,f ) 

IlnlN  '  |  |n|  |»o  c 

/  n 

provided  either  side  exists.  Hence,  lim  V  (JI,f,X)  exists  if  lim  V  (I7,f,X^  ') 

I  lull-0  '  I |n| l-o  c 

exists  for  all  6.  The  method  of  proof  used  in  Theorem  2  shows  that  the  event 
[  lim  V  (n,f ,X)  exists]  is  a  tail  event  and  so  has  probability  zero  or  one. 

Ilnll-o  c 

Now  assume  lim  Vt(n,f,X)  exists  a.s.  and  suppose  /  | f ( t , s ,x) | M(ds ,dx)  <  °° 

[0,T]x[0,l] 

so  that  according  to  Theorem  1,  Z  f(t,s,Js) 

converges  absolutely  with  probability  one.  Let  D  =  lim  V  (n ,f ,X) -If (t , s , Js) . 

I  In!  ho 

By  the  above  it  follows  that  D  is  finite  and 

D  =  lim  V  (II ,  f ,  X  )  -  Z  (f(t,s,Js):  |js|  <  6).  Again  the  method  of  proof 

I " 

of  Theorem  2  shows  D  is  tail  measurable,  and  hence  must  be  constant  a.s,  This 
completes  the  proof. 

It  should  be  noted  that  the  constant  C  need  not  be  zero  (i.e.,  the 

variational  sum  need  not  converge  to  what  we  would  like  it  to) .  An  example 

2 

of  Fristedt  and  Taylor  shows  that  for  f(t,s,x)  =  x  different  limits  can  occur 
(see  example  1,  page  275,  of  Fristedt  and  Taylor  (1973)).  In  the  next  section 
we  consider  a  further  specialization  of  f  sufficient  to  ensure  convergence 
to  the  desired  limit.  It  will  include  the  linear  processes,  and  also  many 
important  likelihood  ratios. 

64.  Applications  to  Linear  Processes  and  Signal  Detection 

In  this  section  we  shall  consider  functions  of  the  form  f(t,s,x)  = 
g(t,s)h(x)  where  g(t,s)  is  uniformly  bounded,  and  h(x)  =  0(|xh)  for  some  y  >  0. 

By  Theorem  1,  the  desired  variational  sum  limit  Z  f(t,s,Js)  will  converge  if  and 


1 

<  ”,  so  we  shall  assume  / 

-1 
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only  if  [q^JxI;-!,!]^  >s  >x)M(ds ,dx) 


|x|yM  (dx)  <  ”, 


and  y  <  2. 


In  order  to  prove  the  theorems  of  this  section  we  will  need  inequalities 

due  to  Millar  and  to  Bretagnolle.  It  is  convenient  to  state  these  results 

here.  The  Millar  inequalities.  Theorem  2.1  of  [23],  are  not  stated  exactly 

as  in  his  paper  but  are  easily  obtained  from  it  by  a  careful  reading  or  his 

proof  and  taking  X  =  Y(l)  where  Y(t)  has  stationary  independent  increments. 

Theorem  (Millar):  Let  X  be  an  infinitely  divisible  random  variable  with 

i.  ti  x  1 1  x 

characteristic  function  exp[itb  +  /(e  -  1  -  -jy— ^-)M(dx)  ] .  Suppose  that  the 

support  of  Levy  measure,  M  is  contained  in  [-a, a]  ,  0  £  a  and  /  txl  'M(dx)  • 

Jx|<a 

(i)  If  a  >  1  then  E|x|a  <  2/,  .  |x|aM(dx)  +  2a  I  EX  I 
—  1  '  —  |  x  |  <_  a  '  1  . 1  1 

(ii)  If  a  <  1  and  if  b  =  /i  i  .  —“7  M(dx) ,  then  E|x|a  <  f\  .  ,xi  Sl(dx) 

We  will  combine  Millar's  inequalities  with  those  of  Bretagno  lie ' s 
Theorem  2,  [A],  stated  here  in  the  restricted  form  which  we  will  actually  use. 
Theorem  (Bretagnolle)  Let  X^,...,Xn  be  independent  random  variables  with 
EX^  =  0,  1  i  _<  n.  Let  S(k)  =  and  let  !?  be  the  class  of  all  finite 

subsets  n  =  {n  ,n  , ...,n  },  0  =  n  <  n  <  ...  <  n  =  n  of  {0,l,...,n}.  Then 

U  -L  K  U  1  K. 

for  1  <  p  <  2, 

EZ"|X.|P  <  sud  Z  _  E I  S  (n . )  -  S(n.  _)|P  <  C  l"  F.|x.|P 
1  i  —  _  n.eil  1  i  l-l  1  —pi  1  i' 


where  is  a  constant  depending  only  on  p. 

The  case  p  <_  1  of  Theorems  A  and  5  below  are  proved  in  Hudson  and  Mason 

(1976).  We  will  give  the  proofs  for  p  >  1.  C  denotes  a  constant  depending 

P 

only  on  p.  Theorem  A  below  extends  the  results  of  Theorem  3  of  Bretagnolle  [A] 

We  shall  write  W  (p,X)  and  V  (Tl,p,X)  for  W  (f,X)  and  V  (fl.f.X)  when  f(t,s,x) 

8  8  t  C 

=  g(t,s) | x |  •  We  shall  assume  from  here  on  that  a(t)  is  of  bounded  variation 

2 

if  p  >  1,  and  /  X/(l+x“)  Mt(dX)  <  »  if  p  <  1. 

I.^mma  2.  Suppose  X(t)  **  (a(t)  ,M  )  and  0  <  p  <  2  and  1  )  =  {)  for  some 
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a  >  0.  If  E(X(t) )  =  0,  Chen 


E(Wg(p,X))  <  Cpg  /  ixj^Cdx) 


Proof.  Let  denote  the  class  of  all  partitions  of  [0,T]  consisting  of  nt*1 

order  binaries.  Let  =  {sup  V(iT,|xjP,X):  II  e  P^}.  By  the  samPle  path 

properties  of  X  it  follows  that  W(p,X)  £  C  (W  (p,X)  and  Y  f  W  (p,X)  a.s. 

Lei  n  be  fixed  and  let  X  =  X(j2~n)  -  X((j-l)s~n),  1  <_  j  £  2nT.  By  Bretagnolle ' s 

inequality  we  have  E(Y  )  <  C  Z  E|x.|P  for  some  constant  C  depending  only 

n-p.j  P 

on  p.  Since  E(X^)  =  0,  we  may  use  the  inequalities  of  Millar  to  obtain 


IE|X  |P  <  E{2  /  | x | P (M  (dx)  - 

j  J  ~  j  j 

IP' 


M 


(dx))}  =  2  /  | x | 1  M  (dx) 

j-1 


Thus 


E(Y^)  <_  2C  /  !x|PM^,(dx)  <  ».  The  monotone  convergence  theorem  shows 


W^(p,X)  <  °°  and  then  this  completes  the  proof. 

Theorem  A.  If  X  -  (a(t),M(_)  is  a  general  process  with  independent  increments, 
W  (p,X)  <  00  a.s.  if  and  only  if  j  jx|PdlL,(x)  <  k'°- 

8  j  x  i  <  i  T 

Proof.  To  see  that  /  ,x!PM_(dx)  <  °  implies  W  (n,X)  <  °°  a.s.  for  X  as 
-  |x|<l*  ^  8 

above,  we  use  truncation.  Lemma  2  and  the  fact  that  W  (p,X)  <  00  a.s.  if  and 

8 

only  if  W  (p,X)  <  Write  Y^(t)  =  X^\t)  -  f^(t)  where  f^(t)  is  defined 

by  f^(t)  =  a(t)  -  5  rfr  H(dx)  -  /  (X  2  -  x)M  (dx) .  Since  X(t) 

| x | >1  1+X~  C  | x | <1  L+X  C 

is  stochastically  continuous,  f*^(t)  is  continuous.  Also  f^(t)  is  of 
bounded  variation  since  for  any  Borel  set  A,  M  (A)  is  a  nondecreasing  function 


of  t.  The  characteristic  function  of  Y^(t)  is  exp[  / 


x  <1 


(e1UX  -  1  -  iux)M^ (dx) ] 
(1). 


and  so  EY,  (t)  =  0.  By  Lemma  2,  W  (p,Y  )  <  00  a.s.  Hence  W  (p,X  )  <  00  a.s. 

1  g  1  A 

Let  N  denote  the  number  of  jumps  of  X(t),  0  t  £  T,  of  absolute  value  greater 
than  one.  The  sample  path  properties  of  the  X(t)  process  show  that  with 
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probability  one  the  paths  are  bounded,  by,  say  B  =  sup{|X(t)|:  0  <  t  <  T}  <  <®. 
Then  for  any  partition  II,  there  are  at  most  N  terms  of  V(n,|x|P,X)  wiich  do  not 
occur  in  V(II,  |  x|  P,X^^ ) .  (The  increments  of  X(t)  and  of  X^^(t)  are  the  same 
unless  an  interval  contains  a  jump  of  absolute  value  greater  than  one.)  Now 
N  is  a  Poisson  random  variable  with  expectation  /  dlL.  Thus  for  any 

M>!  ^ 

partition  IT  of  [0,T],  V(IT,jx|P,x)  <  V(II,  )x|P,X(1))  +  2NBP.  That  is, 

W1(p,X)  £  W1(p,X(1))  +  2NBP  <  0°  a.s. 

It  remains  to  show  that  if  /  |xjPfL(dx)  =  »,  then  W  (p,X)  =  00  a.s. 

W<i  T  1 

As  a  corollary  to  the  Truncation  Principle  we  ohserve  that  £{|j(s)|p:  0  <  s  < 

<  lim  inf  V(II,|x|P,X)  <  W  (p,X).  But  by  Lemma  2,  I  |>:|PM  (dx)  =  *> 

||n||-o  |x|<l 

implies  that  E{|j(s)|P:  0  <  s  <  T}  =  °°  a.s.  By  the  above  inequality,  W^(p,X) 

*  °o  a.s.,  which  implies  W  (p,X)  =  00  a.s. 

8 

The  following  theorem  is  well  known  for  nested  sequences  of  partitions. 

See  Kallenberg  (1974),  or  Hudson  and  Mason  (1976). 

Theorem  5.  If  X(t)  ~  (a(t),ht),  0  <  p  <  2,  and  f(t,s,x)  =  g(t,s)|x|P  with 
g  bounded,  then  lim  V  (II,f,X)  =  Zf  (t ,  s ,  Js)  a .  s .  The  limit  is  finite  if 

llnlbo  c 

/  |  x  |  PM_(dx)  <  °°. 

|x|<l  ^ 

Proof.  Let  0^(t)  =  a(t)  -  /  x/ (l+xz  )M.  (dx)  -  /  (—r  -  x)M  (dx) . 

-  6  I  I  til  l+X*-  t 

|x|>e  |x|<c 

By  the  truncation  principle,  and  path  properties  we  have  for  +£  not  atoms  of 
Tim  V(II,f,X)  <  TTm  E{  f  (t  ,s .  ,  AX.  )  :  |AX.|  <c} 


l"l  N 


n  ho 


j  i 


+  lim  E  { f  ( t ,  s  . ,  AX .  )  :  |AX.|  '■cl 

I  Ini  l-o  -1  J  J 


£  E{f  (t,s,Js) :  j  Js  |  _>  e} 
]  lim 

8  I  Ini l-o 


+  C  lim  v(n , | x | p, x  (c)) 
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where  X^  (t)  =  X(t)  -  I{J(s):  )j(s)|  _>  el.  Let  £  4  0  to  get 

(4.1)  lira  v(r  f,X)  <  Zf(t,s,Js)  +  C  lira  lira  V(IT,  |  X  | P  ,X (c) )  . 

|  |  n  l  |—0  8  £40  |  |  n  |  |  -*-0 

We  will  show  that  lim  lim  V ( K , | x| P ,X^£^ )  =  0  a.s.  Now  by  the  C  inequality, 

--+o  !  | n||-o  '  r 

(4.2)  V(n, |x|p,x(£))  <  2P_1V(n, |x|P,f (e))  +  2P_1V(n, |x|P,Y(£))  where  Y(e)(t) 

=  X^  (t)  -  g^(t).  Note  that  Y^(t)  satisfies  the  conditions  of  Lemma  2 

so  that  EW  (p,Y^)  _<  C  ft  <  |x|PJL(dx).  Now  for  p  >  1,  the  fact  that  g(t)=lim  q^ 

i  p  | x | <e  i  ( t ) 

is  continuous  and  of  bounded  variation  implies  that  for  anv  e  >  0 

(4.3)  Ti^  V(n,|x|P,e(c))  =  0. 

I  |n|  ho 

Thus 

Tim  V(n, |x|P,X(£))<  2P_1  TiS  V(n,|x|P,Y(e)). 

||n|ho  Mill  ho 

But  for  6  >  £ 

li^  V(H,  |xjP,YCe))<  2P_1  TTm  V(n,|x|P,Y(l5)  -  Y(e)) 

I I  n  |  ho  1 1  n  |  ho 

+  2P_1  mm  v(n,|x|p,Y(<5)) 

I  |n|  ho 

<  2P_1  Z{ | J(s) | p :  e  <  |J(s)|  <  5} 

+  2p"1W1(p,Y(6))  . 

Thus,  E[lim  Urn  V(K, | x | P ,Y(e))  ]  <  2P_1EW  (p ,Y (6) )  +  2P_1  /  lx|PM(dx). 

£40  1 |n| ho  e<|x|<6  T 

<  2P  /  |  x |  P1L (dx)  +  2P  h  I  |  x  |  PM (dx)  +  0  as  640. 

£<|x|<6  p  |x|<6 

Since  V(f[,  |  x|  P,Y^ )  0,  this  shows  that 

lim  lim  V(L,|  x  |  P  ,Y  )  =  0  a.s. 

eio  1 1  n |  ho 


(4.4) 


1 
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By  (4.2),  (4.3),  and  (4.4),  we  get  lim  lim  V(IT,|x|^,X^  ^)  -  0. 

•  j|n. |-o 

But  by  (4.1),  this  implies  that  lim  V(FI,f,X)  _<  E{f(t,s,Js):  0  <  s  _<  T}. 

I  i  n  1 1  --0 

As  a  corollary  to  the  truncation  principle  we  know  that 
E{f(t,s,Js):  0  <  s  <_  T }  j<  1  im  V  ( H ,  f ,  X ) 

i  jn| j-o 

and  hence,  lim  V(TI,f,X)  =lff(t,s,Js):  0  <  s  <_  T}. 

1 1  n 1 1 -o 

Linear  processes  considered  by  Eastwood  and  Lugannani  are  particular 

cases  of  the  results  of  the  previous  sections  with  Y(t)  =  g(t,s)dX(s) 

a 

=  Eg(t , s) Js .  By  Theorem  5  we  observe  that  the  natural  variational  (pathwise) 
approximations  Z  g(t,s.)AX.  converge  a.s.  to  Y(t).  Our  results  show  the 

n  J  J 

convergence  is  almost  sure  for  a  class  ’hich  includes  linear  processes. 

Consequently  pathwise  approximation  can  be  done  as  outlined  previously  once 

the  filter  function  g  is  known.  Riedel  (1980)  also  considers  stochastic 

integrals  with  convergence  being  in  probability.  Our  results  also  extend  his. 

As  another  application  of  the  previous  results,  we  can  approximate  the 

likelihood  ratio  of  two  processes  with  independent  increments.  An  explicit 

formula  for  In ( dux dux 2)  for  two  additive  processes  has  been  given  in  Brockett. 

a  , 

Hudson  and  Tucker  (1978)  as  1  (£{ln  -p-r  (s,Js):  0  <  s  <  T,  Js  e  I  ) 

,  dN  -  -  n 

n=l 

dM 

f  (nrr  -  l)dN),  where  I  =  {x:  t  .  <  x  <  £  1,  e  *0  and  M  and  N  are 
dN  n  n+1  -  n  n 

[0, I ]xl 

n 

the  time-jump  measures  of  the  processes  X^(t)  and  X^(t)  respectively.  or 

actual  computation  of  dux^/dux2  8i-ven  3  sample  function,  some  approxinat ion 

dM 

must  be  made  to  this  infinite  sum.  Using  our  results  with  f(t,s,x)  =  ln^rr  (s,x), 

(IN 


13 


we  observe  that  an  a.s.  consistent  estimate  can  be  obtained  by  using 

This  gives  an  alternative  approximation  technique  to  that  utilized  by  Stuck  (1976) 

for  assessing  the  likelihood  ratio.  Essentially  he  assumes  no  jumps  of 

magnitude  less  than  E  occur,  and  then  calculates  the  finite  summation  £fln^(s,Js)} 

dN 

exactly  for  these  approximating  processes.  Here  we  use  the  entire  sample  function 
for  the  untruncated  process,  but  use  a  variational  sum  approximation  for  the 
contribution  of  the  small  jumps. 
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§1  Introduction 


In  the  study  of  multiple  decrement  lifetimes  one  considers  a  group 
of  K  lives.  The  group  is  said  to  survive  at  least  as  long  as  all  members 
are  living.  In  practice,  the  component  lives  are  usually  combined  in 
annuity  or  insurance,  or  some  other  common  undertaking,  and  frequently 
are  related  by  blood,  marriage  or  some  joint  undertaking  which  simulta-r 
neously  exposes  the  individuals  to  common  hazards  and  mortality  risks. 

It  follows  that  in  general  the  joint  survival  function  of  the  group  should 
exhibit  dependence  between  the  component  lives.  The  method  of  analysis 
now  commonly  used  for  multiple  decrement  analysis  however  assumes 
independence  of  the  joint  lives  and  joint  life  insurance  payable  upon  the 
first  death  in  the  group  is  calculated  using  this  assumption  (c.f.  Jordan 
1975,  Chapter  9). 

The  same  mathematical  model,  but  for  a  different  purpose,  is 
encountered  in  the  theory  of  competing  risks  in  biometry.  Two  very 
good  reviews  of  the  literature  on  this  topic  are  Birnbaum  (1979)  and 
Moeschberger  and  David  (1978).  In  this  situation  there  are  postulated  to 
be  K  different  mutually  exclusive  and  collectively  exhaustive  causes  or 
risks  which  compete  to  cause  the  eventual  failure  of  the  mechanism  under 
study.  The  mechanism  is  assumed  to  function  until  at  least  one  of  the 
causes  or  risks  precipitates  system  failure.  The  model  is  fairly  general; 
different  components  in  a  complex  electrical  network  may  "compete"  for 
equipment  failure,  different  diseases  may  "compete"  for  a  life,  or 
different  pathways  may  compete  for  a  chemical  reaction.  Again, 

dependence  of  the  risks  is  often  present  (e.g.  systematic  diseases). 

The  common  mathematical  formulation  of  the  multiple  decrement  (competing 

risk)  model  which  we  shall  investigate  postulates  a  lifetime  X  for  the 
tTi  tli 

iC  person  (i  risk  factor  acting  alone).  The  random  variables 
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(X^,XZ>  ...  ,  X^)  are  called  the  net  lives  corresponding  to  the  k  postulated 
possible  causes  of  system  failures.  The  net  lives  may  be  potentially 
observable  random  variables  as  in  the  multiple  decrement  model,  and  in  the 
competing  risk  formulation  for  complex  electrical  equipment,  or  they  may  be 
a  theoretical  mathematical  construct,  as  in  the  case  of  several  diseases  com¬ 
peting  for  lives,  or  a  survival/censorship  model.  The  actual  life  is 

defined  to  be  the  observed  time  of  system  failure,  Z  =  min  X. 

1  <1  <1< 

Usually  another  variable  J,  the  cause  of  failure  is  also  observed,  J  =  i  if 
Z  =  X^.  The  pair  (Z,J)  is  called  the  identified  minimum.  Thus,  if 
Sx(x)  =  P  [X^  >  x^,  . . .  ,  X^  >  x^]  is  the  joint  survival  function,  then 
Sz(t)  =  (t,  ....  t)  is  observable,  as  are  the  so-called  "crude  life 

probabilities"  Qi(t)  =  P [Z  >  t ,  J  =  i]  i  =  1 ,  2 ,  ...  K. 

On  the  other  hand,  the  joint  survival  function  Sx(x)  is  not  directly 
estimable.  Knowledge  of  would  completely  determine  the  relationships 
between  the  competing  risks  (or  insured  lives  in  the  multiple  decrement 
life  table).  The  ident if  lability  problem,  then,  is  to  reconstruct  from 

known  Sz(t)  and  Q  (t)  ,  i  =  1,  2,  ...  ,  K. 

In  section  two  we  shall  review  the  literature  and  previous  contri¬ 
butions.  In  section  3  we  introduce  a  bivariate  model  with  dependence  and 
marginal  distributions  of  the  Makeham  type.  This  may  be  useful  since  the 
Makeham  law  adequately  fits  univariate  life  table  date,  and  consequently 
this  model  provides  a  dependent  alternative  to  the  commonly  used  technique 
in  multiple  decrement  life  table  analysis  which  assumes  independent  Makeham 
marginal  distributions  (c.f.,  Jordan  1975). 

In  section  4  we  introduce  dependent  models  which  are  scale  mixtures 
of  independent  random  variables,  and  prove  identif lability.  These  models 
include  dependent  proportional  hazards  models,  multivariate  Pareto  distri¬ 
butions,  and  also  a  multivariate  logistic  law. 
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§  2.  Review  of  the  literature 


The  proofs  of  most  of  the  results  of  this  section  can  be  found  in 
the  review  papers  by  Birnbaum  (1979)  and  Moeschberger  and  David  (1978). 
These  results  are  stated  for  completeness,  and  to  put  our  contribution 
in  context. 

The  first  contribution  to  the  subject  was  by  D.  Bernoulli  (1760) 

who  estimated  mathematically  the  effect  on  the  average  life  expectancy 

which  would  obtain  if  the  risk  factor  small  pox  was  eliminated  from  the 

population.  Naturally,  Bernoulli  did  not  quite  frame  the  problem  as  we 

have.  In  the  modern  guise,  the  first  contribution  to  the  identif lability 

problem  was  that  of  Berman  (1963).  He  proved  that  if  ,  Xz  >  ...  ,  are 

independent,  then  F  is  uniquely  identifiable  from  the  knowledge  of  the 

identified  minimum.  In  particular,  he  shows  that 
k  x, 

Sv(x)  =  IT  exp  {-/  1  r  (t)dt) 

?  '  i=l  0 

where  r^(t)  =  Q^’(t)/Sz(t)  i  =  l,2,...,k.  Thus,  the  usual  actuarial  prac¬ 
tice  of  assuming  independent  lives  for  multiple  decrement  analysis  can  be 
based  on  identif lability . 

For  the  case  of  dependent  net  lives  much  less  has  been  done.  Tsiatis 

(1975)  showed  that  in  the  general  case,  S.  is  not  uniquely  identifiable 

X 

from  knowledge  of  the  identified  minimum.  Thus,  there  can  be  several 
joint  survival  functions  with  the  same  values  for  Sz(t)  and  Q  (t)  , 
i  =  l,...,k.  In  particular  he  proves  the  following  lemma. 

,  dQ  ft) 

Lemma  1  (Tsiatis  1975)  For  the  competing  risk  model,  Q ’ ( t )  =  —  = 

^  dt 


at  any  point  t  at  which  the  derivative  exists. 

Xi=x2...=xk=t 
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It  is  clear  from  this  lemma  that  the  knowledge  of  the  identified 
minimum  can  only  determine  S  and  its  derivatives  along  the  line 

A 

X^=X2=...  =X^=t,  and  generally  this  is  not  sufficient  to  uniquely  deter¬ 
mine  S  (x) .  (Note  that  this  is  a  variant  of  the  famous  Cauchy  problem 

A  " 

in  partial  differential  equations,  a  topic  of  considerable  study  c.f. 

Copson  [1975].) 

Since  the  general  dependent  model  cannot  be  identified  uniquely, 
and  since  the  independence  model  is  often  unrealistic  (e.g.  when  study¬ 
ing  systemic  diseases,  or  when  investigating  the  relationship  between  net 
lives) ,  it  is  natural  to  assume  S  belongs  to  a  flexible  parametric 
family  and  to  attempt  to  establish  identif iability  within  this  family. 

This  parametric  approach  has  the  advantage  of  requiring  no  information 
other  than  the  observable  distributions  S„(t)  and  Q.(t)  i  =  l,,..,k,  and, 

Li  1 

of  course,  the  parametric  family  involved.  Unfortunately,  not  every 
popular  multivariate  family  of  distributions  is  identifiable  (see  e.g. 
example  2.2). 

The  next  step  in  the  development  of  this  subject  then  is  the  formu¬ 
lation  of  identifiable  dependent  multivariate  models.  The  development  of 
this  aspect  of  the  theory  has  progressed  in  a  somewhat  adr-hoc  manner,  as 
the  following  examples  show.  The  first  three  models  are  bivariate  ex¬ 
ponential  models. which  bivariate  extension  is  desired  depends  upon  which 
of  the  properties  of  univariate  exponentials  we  wish  to  preserve.  Almost 
all  the  models  assume  there  can  be  no  ties  for  the  cause  of  death  variable  J, 
i.e.,  failure  is  due  to  a  simple  cause. 

Example  2,1  Assume 

S,.(x1,x2)  =  exp{-c1x1-c2x2-c3x1x2]  ,  x1?x2  >  0 

where  c^,  c2,  c^  are  non-negative.  Then  Sj,(x)  is  uniquely  determined  by  S^, 
and  Q2>  This  is  the  simplist  bivariate  exponential 
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Example  2.2  (j51ock-Ba&u  Bivariate  Exponential! 
Assume  S  is  given  by  its  density 

A 


f(x1,x2)  = 


(  Cc]/c2+c3^ 

Cl+C2 


j  Cc2^cl+c3^  exp  (-(c^+c.jix^-^x,^ ;  >  x2 

1  r-  4 -r*  i  I 


exp 


Vc1x1-(£2+c^)x2|  x^  <  x2 


Cl+C2 


V  0  for  x^  or  x,  <  0 


where  C  =  c^+c^+c^,  c^.c,,  >  0,  >_  0 


Then  S  is  not  identifiable  (see  Basu  and  Ghosh  (1978)).  This  model  retains  the 

S 

exponential  "lack  of  memory"  property,  but  does  not  have  exponented  marginals. 
Example  2.3  (Downton  Bivariate  Exponential) 


Assume  S  is  given  by  its  density  function 
A 


C1C2 


f(x1,x2)  =  yT~  1 


Lo( 


2  /pc1c2x1x2 

• 


r 

)  exp  | 

I  c1x1+c2x2 

1-p  f 

L  > 

for  X2X2  —  °>  ci,c2  51  °»  0  £  p  1  where  Iq  is  the  modified  Bessel  function 

of  the  first  kind  of  order  0.  Then  Sx  is  identifiable.  (See  Downton  1970),  It 
has  exponential  marginals,  and  arises* from  a  non-fatal  shock  model. 

Example  2.4  (Bivariate  Normal) 


Assume  that  is  given  by  the  density 


f  (x^) 


then  Sv  is  identifiable.  See  Nadas  (1971)  for  a  proof,  and  Basu  and  Ghosh 

A 

(1978)  for  further  comments. 


i 
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We  may  also  account  for  simultaneous  causes  of  failure  hy  the 

k 

following  mathematical  mechanism.  For  each  vector  se{0,l}  let  Qg(t)  = 
P(Z>t,  J  =  s) ,  i.e.  Qg  is  the  probability  that  the  life  exceeded  t  and  the 
cause  was  simultaneously  i^,  ...  ,  i^  where  the  i^'s  are  the  coordinates 
of  s  equal  to  1.  In  this  framework  we  can  identify  2  vl  "causes"  of  death. 
In  this  setting  the  Marshall^-Olkin  k  dimensional  distribution  given  in  the 
next  example  will  corresponding  to  2  ^1  independent  exponential  emitting 
risks.  This  is  then  identifiable  from  Berman’s  results. 


Example  2.5  (Marshall-Olkin  bivariate  exponential) 

Assume 

Sx(x1,x2)  =  exp  -c^tC^ttC-j  max(x1>x2) 
forXl,x2  >0,  where  c^  c2  >  0  and  c3  >_  Q.  Then  is  identifiable. 


The  Marshal l-01kin  exponential  has  exponential  marginal  distributions,  and 
also  the  "lack  of  memory"  property  SCxj+t.x^t)  =  S(x,,x2)S(t  ,t)  . 

They  also  show  that  any  bivariate  distribution  with  both  these  properties 
cannot  be  alsolutely  continuous. 

These  are  the  parametric  classes  which  have  been  investigated  so  far, 
and  the  proofs  of  identifiability  do  not  always  yield  constructive  methods 
for  parameter  estimation.  For  example,  the  Downton  exponential  and  the 
bivariate  normal  are  shown  to  be  identifiable  by  showing  two  distinct  such 
distributions  cannot  have  the  same  values  for  Sz(t)  and  (^(t)  i  =  1,  2. 

Thus,  although  the  parameters  are  known  to  be  uniquely  determined,  no  recipe 
is  given  for  calculating  these  parameters.  In  the  next  two  sections  we 
introduce  some  parametric  classes,  prove  identifiability  and  explicitly 
give  parameter  estimates. 
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§.3  A  Dependent  Bivariate  Makeham  Distribution 

In  his  original  paper  of  1825,  in  which  he  derived  the  survival 
model  which  bears  his  name,  Gompertz  assumed  "the  average  exhaustion  of  a 
man's  power  to  avoid  death  to  be  such  that  at  the  end  of  equal  infinitely 
small  intervals  of  time  he  lost  equal  portions  of  his  remaining  power  to 
oppose  destruction  which  he  had  at  the  commencement  of  these  intervals." 

Using  the  reciprocal  of  the  hazard  function  as  a  measure  of  'ability  to 
oppose  destruction"  gave  a  differential  equation  for  the  hazard  function 
h(t)  whose  solution  yields  the  survival  function  exp  (-dexp(st)}.  Gompertz 
recognized  that  death  might  be  due  to  two  causes,  chance  and  deterioration, 
but  Makeham  (1860)  was  the  one  to  make  the  requisite  constant  addition  to 
the  hazard  function. 

If  we  introduce  dependence  into  a  bivariate  Makeham  model  by  adopting 
a  Marshall-Olkin  approach  we  are  led  to  the  survival  function 

S(x,y)  =  exp{-c1x-c2y-c3  max(x,y)  -d1  expts-^x)  -  d2  exp(s2y)>  (3.1) 

The  rationale  of  this  model  is  as  follows.  Let  denote  the  time  to  death  of 
person  i  due  to  deterioration  alone,  a  Gompertz  random  variable  with  para¬ 
meters  d^  and  s^,  i  =  1,  2.  Random  shocks  or  accidents  occur  according  to 
Poisson  processes.  The  intensity  parameter  of  the  process  N^(t)  of  shocks 
which  are  fatal  only  to  person  i  is  c^,  i  =  1,  2,  and,  due  to  the  simultaneous 
exposure  of  the  two  persons,  there  is  an  intensity  c^  for  random  accidents 
like  automobile  crashs  which  could  be  fatal  to  both  individuals  simultaneously. 
Let  N^(t)  be  this  third  Poisson  process.  All  the  random  variables  W^,  W2,  N^(t) , 
N2(t)  and  N^(t)  are  assumed  to  be  independent.  Then  if  X  is  the  actual  life  of 
person  1,  and  Y  is  the  actual  life  of  person  2,  it  follows  that 
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S(x,y)  =  P[X>x,Y>y]  =  PlW^x.N^x)  =  0,  W2>y,  N2(y)=0,  N^CmaxCx.y)  =  0] 

=  exp{-d^exp(s^x) }exp{-c^x}exp{-d2exp(s2y) )exp{-d2y}exp{-c2max(x,y) } 
which  is  (3.1). 

Note  that  S(x,y)  is  indeed  a  bivariate  Makeham  distribution  in  the 
sense  that  the  univariate  marginal  distributions  are  both  Makeham  distribu¬ 
tions.  In  addition,  if  d^  =  d,,and  s^  =  s2  (so  the  Gompertz  "deterioration" 
components  are  the  same  for  the  two  individuals),  which  is  a  common  assumption 
in  actuarial  calculations,  then  even  the  observed  minimum  of  X  and  Y  has  a 
Makeham  distribution  as  well. 

The  next  theorem  shows  that  this  bivariate  Makeham  is  identifiable.  Note 
that  this  does  not  follow  from  Berman's  result  since,  for  example  if  person  1 
dies,  we  do  not  know  whether  deterioration  or  chance  was  at  fault,  and  hence 
do  not  have  an  identified  minimum  for  the  formulation  in  terms  of  the  independent 
variables  W^,  W2,  N^,  N2 ,  even  though  it  is  an  identified  minimum  problem 
in  the  X,Y  formulation. 

Theorem  1  The  bivariate  survival  function  (3.1)  is  identifiable. 


Proof  First  we  note  that  for  this  survival  function  which  is  not  differ¬ 

entiable  along  the  x  =  y  line,  we  must  modify  the  Tslatis  result  (Lemma  1) 
to  read 


Q  (t)  =  lira 
xlt 


IT  s(x,t)  1  and 


Ql  (t)  =  lim 
y+t 


8x 


S(t,y) 


The  proof  of  this  result  is  essentially  the  same  as  that  of  lemma  1.  We 


calculate  for  (3.1). 
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(t)  =  {-c^-d^s^exp(s^t) }  S(t,t)  i  =  1,  2. 

We  note  that 

d  Qi(t)  2 

ip  (t)  =  -i—  —^-7 - r  =  -d.s.  exp(s.t)  is  observable, 

i  dt  S(t,t)  ill 

f  log  (-  ip.  (t))i 

and  lim  \  - >  =  s,,  i  =  1,  2,  so  that 

t-» 00  l  t  ]  1 

n 

Sj  and  s^  are  obtainable.  Using  the  known  s.^  gives  d^  =  ip^(O) /s  ./'•  Once 
s^,d^  are  known,  follows  from  Q^(t)/S(t  ,t) .  Now  that  ,C2 ,s^ , S2 ,d^ ,d^ 
are  all  known,  follows  from  S(t,t)  =  S^(t)  and  all  parameters  have  been 
uniquely  determined. 
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§4.  Scale  mixture  models. 


Many  parametric  families  of  k-dimensional  distributions  can  be 
characterized  as  scale  mixtures  of  distributions  with  independent  marginals. 

Thus  we  begin  with  ...  ,  U^)  independent  random  variables  and  take 

W  >  0  independent  of  the  U  's.  The  random  vector  X  =  (X..  ,  ...  ,  X.)  where 

1  %  i.  K 

X.  =  ll/W  has  a  distribution  describable  as  a  scale  mixture  of  distributions 
l  i 

with  independent  marginals.  Such  mixtures  mieht  arise  »-’hen  there  is  differential 
susceptibility  to  dying  from  different  causes  in  a  population,  for  example,  if 
causes  of  death  are  closely  related  to  diseases  that  are  genetically  controlled 
in  a  heterogeneous  population.  Not  every  such  scale  mixture  will  be  identifiable  . 
However,  if  we  assume  that  the  U^'s  are  independent  with  proportional  hazards 
then  we  can  get  identif iability . 

The  following  result  generalizes  the  result  of  Elandt-Johnson  (1976)  and  also 
Elandt-Johnson  and  Johnson  (1980)  §9.9  since  we  do  not  require  the  X^'s  to  have 

proportional  hazard  functions,  but  only  that  the  X± |w  have  proportional  hazards. 

Theorem  2  Suppose  conditional  on  W  =  X,  X^  are  independent  with  proportional 
hazards  h(t),  so  that  the  joint  survival  function  is 

k 

S  (x)  =  /  II  exp{-5  XH(x.)  JdG(X)  with  H(x)  =  ix  h(t)dt. 

~  ~  i=l  1  1  0 

Where  G  is  the  distribution  function  of  random  variable  W. 

(a)  if  h  is  known  then  S>;  is  identifiable  in  {6  }  and  G. 

(b)  if  G  is  known,  then  Sy  is  identifiable  in  {o  }  and  h 

Proof 

(a)  We  first  note  that  by  changing  the  scale  of  G  appropriately  we  may 
assume  without  loss  of  generality  that  6^=1,  and 
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I  I 

S  (x)  =  I  exp  S-  X  E  6  H(x  )  dG(X) 
(  i=l  ! 

k 

=  Lr  (  E  6  H(x  )) 

G  i=l  1  1 


(4.1) 


where  L  denotes  The  Laplace-Stieltj es  transform  of  the  measure  dG.  Thus, 
G 

from  the  above,  and  Lemma  1  we  have  for  i  =  1 . k. 


Qi<t0) 


as. 


3x , 


=  Vco 


^0^  !  XexpJ-XH(tQ) 


k  I 

E  6  \  dG(X)  . 
j*l  J 


(4.2) 


Hence,  Q|(tg) /Q|(tp)  =  6^,  i  =  1,  2,  ...  ,  k.  Now,  with  the  knowledge  of 
the  d^’s,  and  of  Sz(t)  =  S^(t) ,  we  know  the  Laplace-Stieltjes  transform  of 

k 

G  on  the  curve  H(t)  E  5..  Differentiability  implies  there  is  a  limit 

j=l  J 

point  on  this  curve,  so  that  the  analyticity  of  the  Laplace  transform 
implies  this  determines  G,  and  hence  by  integration  with  known  (d^),  we 
have  determined  S  , 

'  -Ik 

(b)  Since  G  is  assumed  known,  from  (4.1)  we  have  L  (S  (t))  =  H(t)  E  d, 

G  Z  i=l  1 

As  before,  the  6^’s  are  uniquely  determined  from  Q^(t)/Q^(t),  so  the  above 
formula  determines  H,  and  hence  h. 

Example  4.1  Suppose  conditional  on  W  =  X,  X^  , . . .  ,  X^  are  independent  Weibull 

C  ” '  1 

variables  with  hazard  h(x^)  =  cd^Xx^  ,  c  known.  Then  unconditionally 

k  c  /  k 

S  (x)  =  1  TI  e  'rj5i*xi  dG(X)  =  Lp  [  E  d  x*!  , 

~  '  i=l  G  \i=l  1  1  • 

is  an  identifiable  dependent  competing  risk  model,  (again  G  is  the 
distribution  function  of  W) . 
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Example  4.2  Suppose  the  joint  survival  function  of 
variate  Pareto,  i.e.,  has  the  survival  function 


V?> 


k 

1+  T. 
1=1 


for  x1  >  Pj 


(xx, 


, , .  ,  X,  )  is  multi- 
*  k 


(Such  distributions  are  called  Pareto  (II)  in  Arnold  (1981))  then 
is  identifiable. 


Proof  We  note  first  that  Q^(t)  =  0  if  and  only  if  t  p^,  so  the  Q/s 
determine  the  p^'s.  Knowing  {p^l  allows  us  to  change  variables,  =  x^-p^ 

to  reduce,  without  loss  of  generality,  to  the  p  =  0  case. 

Now,  the  Laplace  transform  of  a  r(a,l)  variable  is  (1+u)  a  ,  so  that  by  (4,1) 
S  (x)  represents  the  unconditional  joint  survival  function  corresponding  to 

A 

,  ...  ,  Xfc  being  conditionally  independent  exponential  variables  with 
hazard  functions  wx^/c^  conditioned  on  W=w  where  W  has  a  T(a,l)  distribution. 
From  (4.2)  we  have 


so  that 


Let 

and 


q:(t)  =  -  (1  +  t  I  l/o  )~a  1 

i  j=l  J 


i  !  /  k 

Q'(t)/S„(t)  =  i  i  -  a/ (1+t  X  1/a.) 
1  z  °i  I  /  i=1  j 

=  Q|(0)/Sz(0)  =  -a/a1  i=l ,  ...  ,  k 
n  =  Qi(l)/sz(l)  =  +  £  al^°j^ 


.  i  =  1,2, ...k. 


and  we  have 


k+1  equations  in  k+1  unknowns  which  are  easily  solved  for  a.a^,  ...  ,  a^. 

Thus  the  multivariate  Pareto  (II)  Is  identifiable  and  an  explicit  representation 
for  the  parameters  is  given. 
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If  the  X^’s  are  known  functions  of  random  variables  which,  given  W, 
are  independent  with  proportional  hazards,  we  may  again  conclude  identifi- 
ability.  As  an  illustration,  we  may  consider  scale  mixtures  of  independent 
Weibull  laws  with  known  but  arbitrary  shape  parameters.  It  turns  out  however 
that  for  scale  mixtures  of  Weibull  distributions,  we  do  not  necessarily  have 
to  know  the  shape  parameters  to  conclude  ident if  lability.  Thus: 

Theorem  3  Suppose  that  conditional  upon  W=w,  X^ ,  ...  ,  X^  are  independent 

Weibull  variables  with  survival  distributions  Sv  (x.)  =  exp{-6  wx  ai} 

i  i  1 

respectively.  Then  the  unconditional  joint  survival  function  is 

S  (x)  =  s  II  exp{-6  wx.ai}  dG(w)  , 

~  0  i=l  1  1 

A  sufficient  condition  that  S x  be  uniquely  determined  by  the  knowledge  of 
the  identified  minimum  is  that  E(WY)  <  00  for  some  known  y  >0, 

Proof  First  note  that  if  E(WY)  <  ®  for  a  known  value  of y,  then  by  taking 
y*”*1  roots  of  WX  ,  .  .  .  ,  WX  we  obtain  a  scale  mixture  of  (new)  Welbulls 

V 

with  a  (new)  scale  mixing  variable  W  Y  which  has  a  finite  first  moment. 
Thus  it  suffices  to  prove  the  theorem  in  the  case  y  =  1 


From  now  on  we  assume  EW  <  °°.  Knowledge  of  the  identified  minimum  yields  the 
functions  ,t , . . .  ,t)  and  Qi(t)  ,  i  ■  1,  2,  ...  ,  k. 


K. 

Jy  S  (t.t,  ....  t)  =  i  G(  I  1),  where  L  p  denotes  the  Laplace-Stielt jes 


transform  of  G  and  from  Lemma  1. 


f.,s. 
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q;<t) 


00  a.-l  ai 

/  -6  a  wx  exp{-w£S  x  }dG(w) 

a  ill  i  i 


'V'-Vt 


k  a , 


i  00  i 

5  at  /  w  exp{-w  £  5  t  }dG(w) 

*  *  r\ 


“I1 

=  1  L,G(z61t  n 


Consider 


h1(t)  =  in  { tQ^(t) /  sx(t,,..,t)  =  -to  (6^)  +  a±lnt  +  £nUt) 

ai  al 

where  k(t)  =  L'  (U  t  )/L  (16  t  ).  Then  lim  h  (t)/£nt  =  a,  since 

G  1  G  1  t-0  1  1 

ink(t)/int  ■*  0,  To  see  this  note  that  int  -  as  t  ■+•  0  and 

«i 

L'  gdd^t  )/L  gdd^t  )  -*■  Pg,  the  mean  of  G  f  assumed  finite. 

Thus  {a.j}  can  be  recovered.  Without  loss  of 
generality  we  may  assume  6^=1  (since  otherwise  G  may  be  rescaled  accordingly) . 

QI(1)  6iai 

Now  Q-r^y  =  r  —  ,  so  the  6's  may  then  be  uniquely  obtained.  Knowing  (a^),  and 

k  at 

(6  }  implies  we  know  the  Laplace  transform  of  G  along  the  curve  £  {  t  , 

i=l  1 

Since  the  Laplace  transform  is  an  analytic  function,  this  determines  G. 

Knowing  G,  {6^}  and  (a^)  uniquely  determines  S^(x)  in  the  class  of  mixtures, 

and  exhibits  identif lability . 

Example  4.3  Suppose  (X^,...,X^)  follow  a  multivariate  Pareto  (IV)  law, 
i.e.  their  joint  survival  distribution  is 


k  f  Xi~Ul)  1/yl  ~B 

v*>-  1  *  M  V) 


>  Pi,  i  =  1 . k 
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Then  Sx  Is  identifiable. 

Proof  Since  Q^(t)  =  0  if  and  only  if  t  n  ,  we  know  p^,  i  =  l,...,k 
uniquely,  and  without  loss  of  generality  we  may  take  p^  =  0. 

Again,  as  in  Example  4. 2,  we  recognize  the  Laplace  transform  of  the  gamma 
law  and  observe  that  the  multivariate  Pareto  (IV)  is  a  gamma  scale  mixture 
of  conditionally  independent  Weibull  laws,  with  6^  =  1 /o  and  =  1/y^ 
to  fit  the  notation  of  the  previous  theorem.  By  Theorem  3  it  follows  that 

SY  is  identifiable.  In  fact  we  may  estimate  the  exponent  of  the  gamma  scale 

'  .  t  1/Y.  \ 

mixing  distribution  quite  readily  since  ?  =  ~ZnS?(.t)  / Zn  1  +  E  )  1  J 

The  multivariate  Pareto  laws  are  particularly  beautiful  and  useful 
multivariate  laws  for  reliability  and  survival  time  modeling.  In  addition 
to  the  scale  mixture  of  Weibull  characterization  above  (see  also  Takahasi 
1965),  it  is  true  that  for  such  Pareto  laws,  the  k-dimensional  marginals  of  an 
n-dimensional  Pareto  are  again  Pareto  for  any  1  <_  k  n,  and  also  the 
conditional  distribution  of  any  k  dimensional  marginal  given  the  remaining 
fn-k)  variables  is  again  Pareto.  These  are  the  types  of  properties  which 
are  also  enjoyed  by  multivariate  normals,  but  Pareto  laws  do  not  have  the 
mathematical  intractability  and  possible  negative  values  for  survival  times. 


Naturally,  any  known  function  of  an  identifiable  competing  risk  model 
is  also  identifiable,  so  taking  the  logarithm  of  the  crude  life  variables 
in  the  Pareto  (III)  distribution  (i.e.  P  (IV)  with  8=1)  yields  identifia- 
bility  of  the  multivariate  logistic  distribution.  (In  fact  the  Pareto  (III) 
law  is  sometimes  referred  to  as  the  log-logistic  distribution.)  This  is 
important  since  the  logistic  distribution  function  so  closely  resembles  the 
normal  distribution  function.  The  lognormal  distribution  is  not  always 
tractable  mathematically,  however  the  Pareto  often  is  easy  to  work  with. 


I 
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THE  UNDERWRITING  RISK  AND  RETURN  PARADOX  REVISITED 


Patrick  L.  Brockett  and  Robert  C.  Witt 

Recently,  Witt  [1,2]  suggested  that  one  would  expect  to  find  a  negative 
or  inverse  relationship  between  mean  state  loss  ratios  and  standard  deviations 
of  the  associated  annual  loss  ratios  by  state  for  automobile  insurance  if 
underwriting  risk  (as  measured  by  the  standard  deviation  of  loss  ratios  in  a 
state)  were  properly  included  in  insurance  rates,  other  things  being  equal. 
However,  based  on  an  empirical  test  of  this  hypothesis,  he  found  a  direct  rather 
than  an  inverse  relationship  between  these  variables.  Hedges  [3]  later  argued 
that  the  observed  positive  relationship  could  have  resulted  from  a  mathematical 
relationship  between  the  average  loss  ratios  and  standard  deviations  and/or 
from  certain  institutional  factors  which  affect  loss  ratios  and  insurance  rates 
in  the  various  states. 

Based  on  some  empirical  tests  of  economic  and  institutional  factors,  Witt 
[A]  showed  that  Hedges'  institutional  hypothesis  could  not  be  verified.  His 
regression  results  for  the  whole  industry  showed  a  positive  relationship  between 
the  underwriting  risk  and  return  variables  even  after  he  included  various  economic 
and  institutional  variables  in  the  regression  model.  The  regressions  for  the 
direct-writing,  national-agency,  and  regional-speciality  companies  also  revealed 
a  positive  relationship  between  the  means  and  standard  deviations  of  loss  ratios 
by  state.  Thus,  the  addition  of  some  institutional  variables  to  his  regression 
model  did  not  reverse  the  significant  positive  relationship  between  mean  loss 
ratios  and  standard  deviations  by  state.  Accordingly,  he  argued  that  the  per¬ 
verse  observed .positive  correlation  could  not  be  easily  explained  by  differences 
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in  institutional  factors  among  states.1 

In  regard  to  Hedges'  mathematical  hypothesis,  Witt  [A]  argued  that  if  the 
annual  loss  ratios  by  state  are  normally  distributed  there  should  be  no  rela¬ 
tionship  between  the  averages  and  the  standard  deviations  of  these  ratios 
because  there  is  no  mathematical  relationship  between  the  mean  and  variance  of 
a  normally  distributed  random  variable.  Witt  indicated  that  the  Central  Limit 
Theorem  would  suggest  that  the  annual  loss  ratio  in  a  state  should  be  normally 
distributed  because  it  is  nothing  more  than  the  weighted  average  loss  ratio  of 
all  the  companies  writing  a  given  coverage  in  the  state.  Hence,  he  argued  the 
positive  correlations  he  observed  could  not  be  reasonably  explained  by  the 
mathematical  relationship  postulated  by  Hedges.2 

In  Dr.  Vene2ian's  thoughtful  comment,  he  correctly  notes  that  the  Cent 
Limit  Theorem  per  se  is  not  strong  enough  to  imply  that  the  sample  mean  and 
standard  deviation  (or  variance)  will  become  asymptotically  uncorrelated,  or 
independent,  even  though  this  is  basically  a  unique  characterisitic  property  of 
the  normal  limit  law.  However,  in  Witt's  empirical  work,  a  double  Central  Limit 
Theorem  is  in  effect,  and  this  effect  is  sufficient  to  imply  that  the  correla¬ 
tion  between  sample  means  and  standard  deviations  should  be  close  to  zero. 
Basically,  this  is  the  thrust  of  the  heuristic  argument  given  by  Witt  [4].  A 
precise  mathematical  justification  for  this  argument  will  be  given  below. 

We  will  also  show  under  certain  conditions  that  even  treating  premiums  as 
random  variables  does  not  qualitatively  change  the  results.  This  demonst ration 
will  help  to  amplify  the  remarks  of  Witt  and  Venezian.  In  the  final  section, 
we  will  explore  other  possible  explanations,  testable  hypotheses  and  potentially 
promising  avenues  for  further  research  on  the  issue  of  the  proper  relationship 

1Witt  [9,  pp.  657-661]. 

2Witt  [9,  pp.  653-654]. 
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between  underwiting  risk  and  return.3 


Correlation  and  the  Central  Limit  Effect 


Let  denote  the  loss  on  policy  k  in  state  i  and  year  t  and  denote 

the  corresponding  premium.  Here  k  =  1 ,2 , . . .  ,N  ;  i=l,...,51;  and  t=l,...,T; 
where  N  is  the  total  number  of  policies  written  by  all  insurers  in  year  t  in 

state  1.  The  loss  ratio  for  all  insurers  in  state  i  during  year  t  is  as  follows 


"it 


■it  8lt 


T  T 

—  2  —  2 
As  usual,  Z.  =  L  Z.  / T  and  S .  =  I  (Z.  -Z.)  /T  are  the  temporal  sample 

1  t=l  lt  1  t=l  1 

moments  for  state  i  for  the  T  years  analyzed.  The  discussion  centers  around 

—  2  , 

the  correlation  of  (Z^.S^  )  i =  1 ,2 , . . . , 51 . 4  To  simplify  notation,  for  an 
arbitrary  random  variable  W,  we  shall  let  y(W)  denote  the  mean  and,  for  j  >  2, 
p^(W)  =  E[W  -  y(W)  ]3  denote  the  jth  central  moment  of  W.  As  noted  by  Venezian, 
we  have. 


E(Z)  = 


u2(z) 


V2(Z) 


ECS2  ) 


y2(s^) 


—  w2(z)  =  y2(z) 

y4(Z)  -  y22  (Z) 


3The  loss  ratio  in  a  state  was  used  as  an  inverse  measure  of  underwriting 
return  by  Witt  [1,2], 

**It  should  be  noted  that  Witt  [1,2]  used  loss  ratio  data  for  a  series  of  years 
for  all  states  and  the  District  of  Columbia,  or  51  jurisdictions. 
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and  most  importantly, 
Cov(I,S2)  = 


u3(z) 

T~~ 


All  of  the  above  formulas  include  terms  up  to  order  1/T,  and  can  be  found,  for 
example,  in  Kendall  and  Stuart  [5,  pp.  243-246],  The  main  point  is  that  all  the 
second  order  moments  including  the  covariance  term  are  of  order  1/T,  and  hence, 
when  taking  the  correlation  coefficient,  the  1/T  drops  out,  leaving  the  following. 


Now,  the  point  here  is  that  u^CZ)  should  be  close  to  zero  because  of  a 
second  central-limit  effect.  As  shown  above,  the  1/T  terms  cancel.  However, 
the  dependence  on  N  does  not.  Thus,  this  result  does  not  depend  in  any  way  on  the 
non-randomness  of  premiums,  p  ^ ,  as  suggested  by  Venezian. 

Let  us  now  calculate  the  ratio  in  (1)  for  the  central  moments  of  Z  in  terms 
of  the  central  moments  of  the  original  loss  and  premium  variables  X  and  P. 

The  vehicle  we  will  employ  in  this  analysis  is  the  multivariate  version 
of  Taylor's  expansion,  as  specified  in  (2). 

(2)  f(x,y)  =  f (a,b)  +  f  (a,b)(x-a)  +  f  (a,b)(y-b)  +  R 

•  x  y 

where 


{fxx(Vy0)(x-a)  +2f  (x0,y0)(x-a)(y-b)  +  f  (xQ  ,yQ)  (y  -  b)  } 


is  the  remainder  term,  and  (xq>Tq)  is  a  point  on  the  line  segment  joining  (x,y) 
to  (a,b).  By  dividing  both  the  numerator  and  denominator  of  Z  by  N^t,  we  obtain 
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the  ratio  of  average  losses  to  average  premiums: 


'iC  "  Pit  ’ 


and  by  choosing  f (x,y)  =  x/y ;  a=u(X)  =  p(X);  b  =  p(P)  =  p(P)  ,  we  obtain  (after 
calculating  the  appropriate  partial  derivatives)  the  result  specified  in  (3) 


(3)  zit  ■  +  ^  (x-‘,(x)>  --TZT  <p-',<p))  +  * 

v  (P) 


where  R  is  the  random  remainder  term  shown  below: 


R  =  — —  (X  -  p(X))(P  -  y(P))  +  (P  -  y(P))2 

P  P  1 

0  0 


and  (Xq.Pq)  is  a  point  between  (X,P)  and  (p(X) ,p(P))  .  Assuming  fourth  moments 
exist,  one  can  show  via  Tchebychev's  Inequality  that  /n ^ t * R  converges  in 
probability  to  zero  as  N  increases  without  bound,  and  consequently  from  (3) 
that  /fi'  is  asymptotically  normal  for  each  (i.t).  For  convenience,  we 

now  drop  the  subscript  (i,t). 

Ignoring  terms  of  higher  order  in  1/N  yields  the  obvious  approximation 
to  (3)  below. 


1  ’•>  ®  +RpT<x-"<»)  -7^ 


The  moments  of  Z  may  now  be  read  off  from  (4)  and  plugged  into  (1)  after 
noting  that  Z  -  E(Z)  =  W  where 


W,  =  “T^T  (X.,  -  p(X))  — (P.,  -  p(P)) 

k  y(P)  lkt  pZ(P)  ikt 
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k  *  1,2, and  W^  has  mean  zero.  Thus,  we  obtain 


U2(Z)  =  p2(w) 


n2(w) 

N 


M,(W) 

Ji3(Z)  =  n3(w)  =  ~~- 


p  (W)  3N  ( N  -  1 ) 

and  P4  (Z)  =  u^(W)  =  - ^ ^ -  P2  (W)  .  5 

N  N 

2  ^  2 

We  observe  that  p^(Z)  is  of  order  1/N  and  / p2 (Z) { p^ ( Z)  -  p2  ( Z) }  is  of  order 
3  /2 

1/N  so  that  the  correlation  coefficient  in  (1)  is  of  order  1//TT.  Since  the 

number  of  policy  owners  and  companies  per  state  is  quite  large,  we  would  indeed 

—  2 

expect  the  correlation  of  Z^  and  to  be  close  to  zero,  as  Witt  has  implicitly 

claimed.  Notice  that  the  above  derivation  did  not  depend  on  the  assumption  of 
constant  premiums,  so  the  source  of  the  positive  correlation  found  in  Witt  [1,2] 
cannot  be  explained  simply  by  differing  premiums  across  companies. 


Other  Results,  Hypotheses,  and  Directions  for  Further  Research 


Witt  [1,2]  noticed  a  cyclical  pattern  in  loss  ratios  over  time,  and  Venezian 
commented  upon  the  autoregressive  nature  of  the  loss  ratios.  We  wish  to  expand 
slightly  upon  Venezian's  comment  here.  The  problem  is  that  the  premiums  in 
year  t  are  not  in  fact  independent  of  the  past  loss  experience,  and  probably 
could  be  specified  in  the  following  way: 


'  ikt 


Xyt  +  (1-  A)  I  B1Xt_1  + 


B2Xt-2  + 


+  6  X  ] 

q  t-q 


+  E. 


It  might  be  noted  that  the  coefficient  of  the  second  term  in  this  equation  could 


be  written  as  6 


/N 
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That  is,  the  premium  is  set  in  practice  by  taking  some  sort  of  a  convex  combina¬ 
tion  of  the  subjective  expected  loss  and  other  economic  factors  for  year  t,  Mt, 
and  a  regression  of  the  previously  experienced  losses  for  the  last  q  years.  An 
expense  loading,  E^,  is  added  to  the  loss  component  to  obtain  the  gross  premium. 
Frequently,  q  will  be  small  and  A  =  0.  If  A^O,  then  might  reflect  an  effect 
on  premium  size  due  to  investment  income  considerations  and  other  prior  economic 
knowledge  that  insurers  recognize  along  with  empirical  loss  and  expense  projections. 

Empirical  Bayes  estimates  for  premium  levels  fit  into  this  framework  as 
well.  In  such  situations,  the  loss  ratios  Z_*t  for  t=  1,2,...,T  are  dependent 
random  variables,  even  if  are  independent.  Indeed,  if  ^  is  quite  large, 

then  P.  would  tend  to  be  large,  and  hence  Z.  would  tend  to  be  small,  other 

it  it 

things  being  equal. 

To  formalize  the  above  mathematically,  we  can  use  the  Taylor  series  expansion 
frcm  (4)  and  regression  formulation  for  premium  determination  to  obtain  the 
following  autoregressive  formulation: 

q  _ 

(6)  Z._  =  a  +  I  b.(X„  .  -  y(X) ) 

it  t  .  ,  j  t-j 

The  coefficients  are  determined  by  collecting  the  regression  coefficients. 

Formula  (6)  offers  a  testable  model  for  analysis.  The  techniques  of  Box  and 
Jenkins  [6]  would  seem  applicable  here  for  estimating  parameters  and  determining 
the  order.  One  might  even  use  (6)  for  predicting  future  loss  ratios  and  improv¬ 
ing  management  planning  and  cash  management. 

Another  approach  to  modeling  the  cyclical  pattern  is  to  utilize  the  results 
of  Kemperman  [7]  on  oscillating  random  walks.  In  this  case,  the  loss  ratios 
behave  according  to  different  random  walks  on  either  side  of  some  specified  line 
or  real  axis.  While  in  the  upper  half  of  the  plane,  the  random  walk  would  have 
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a  tendency  to  move  down,  and  when  in  the  lower  half  of  the  plane  the  loss  ratio 
variable  would  have  a  propensity  to  move  up.  Thus,  a  cyclical  pattern  is  an 
intrinsic  part  of  the  model.  Related  models  of  interest  would  include  oscillating 
diffusion  models  studied  by  Keilson  and  Wellner  [8]  and  the  paper  on  records 
and  oscillating  random  walks  by  Aggarwal  [9]. 

Although  Witt  [1,2]  used  the  temporal  standard  deviation  associated  with 
loss  ratios  in  a  state  as  a  measure  of  underwriting  risk,  other  possible  measures 
of  underwriting  risk  could  be  developed.  For  example,  deviations  from  the  auto¬ 
regressive  patterns  specified  above  could  be  used  as  a  measure  of  underwriting 
risk,  as  Venezian  has  suggested.  Recently,  Witt  and  Miller  [10]  proposed  another 
measure  of  underwriting  risk  based  on  the  systematic  component  of  total  under¬ 
writing  risk  which  basically  captures  the  unpredictable  and  non-cyclical  component 
in  a  state’s  loss  ratio.  They  argued  that  the  variability  in  the  annual  loss 
ratios  in  a  state  is  a  measure  of  total  underwriting  risk,  which  has  two  com¬ 
ponents:  the  systematic  and  unsystematic  parts.  The  systematic  component  of 

underwriting  risk  was  determined  by  regressing  a  state’s  loss  ratio  on  the 
national  loss  ratio  during  a  given  time  period.  The  resulting  measure,  referred 
to  as  "Beta,"  shows  how  the  state's  loss  ratio  varies  with  respect  to  the 
national  loss  ratio  on  a  relative  basis.  The  remaining  variability,  which  was 
not  explained  by  the  national  loss  ratio,  was  designated  as  the  unsystematic 
component  of  underwriting  risk.  In  this  guise,  the  model  is  statistically 
similar  to  the  capital  asset  pricing  model. 

Witt  and  Miller  [10]  noted  that  systematic  underwriting  risk  for  a  line  of 
insurance  cannot  be  eliminated  by  diversification  across  states.  The  unsystematic 
component  of  underwriting  risk,  in  contrast,  could  be  reduced  or  eliminated  by 
writing  insurance  in  more  than  one  state.  Since  total  underwriting  risk  would 
tend  to  decline  as  the  number  of  states  in  which  an  insurer  sold  insurance 
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increased,  they  reasoned  that  increasing  diversification  by  state  would  gradually 
eliminate  the  unsystematic  risk  faced  by  the  insurer,  leaving  only  systematic 
or  national  market  risk.  The  remaining  systematic  risk,  they  argued,  was 
associated  with  business  risk  or  the  unpredictable  overall  performance  of  the 
economy.  Thus,  they  indicated  that  the  systematic  risk  associated  with  a  line 
of  insurance  could  not  be  eliminated  even  if  an  insurer  wrote  in  all  states. e 

Although  the  results  were  not  reported  in  their  paper,  Witt  and  Miller  [10] 
found  a  statistically  significant  positive  relationship  between  average  state 
loss  ratios  and  their  associated  measure  of  systematic  risk  for  1971  through 
1979.  They  expected  that  average  state  loss  ratios  would  vary  inversely  with 
their  "Betas,"  but  they  had  to  reject  this  hypothesis  about  underwriting  risk 
and  return  for  all  company  groups  analyzed  (direct  writers,  national-agency 
companies,  regional-specialty  companies,  and  all  companies  combined).7  Thus, 
the  use  of  a  systematic  measure  of  underwriting  risk  did  not  help  to  resolve 
the  underwriting  risk  and  return  paradox.  Perhaps,  the  development  of  a  testable 
hypothesis  including  investment  income  along  with  underwriting  income  would 
help  to  resolve  the  paradox.  It  is  possible  that  insurers  were  willing  to 
accept  relatively  low  underwriting  returns  during  the  1971-1979  time  period  in 
exchange  for  higher  anticipated  investment  returns.  By  keeping  their  prices 


fThey  use  the  systematic  measure  of  underwriting  risk  for  determining  the  impact 
of  rate  regulation  on  loss  ratios  in  states  with  different  types  of  rate  regula¬ 
tory  systems.  Basically,  they  expected  the  systematic  component  of  the  variability 
in  loss  ratios  for  a  state  to  be  greater  in  highly  regulated  rate  states  than 
in  states  with  competitive  rate  regulatory  laws.  They  found  some  statistical 
support  for  this  hypothesis.  In  general,  the  betas  in  competitive  rate  states 
were  found  to  be  lower  than  those  in  non-competitive  ones.  This  result  seemed 
to  suggest  that  insurers  faced  greater  systematic  underwriting  risk  in  highly 
regulated  or  non-competitive  rate  states  than  in  competitive  ones.  It  might  be 
useful  to  perform  a  similar  analysis  using  the  cyclical  auto-regressive  formula¬ 
tion  alluded  to  above. 

*i 

Statistical  summaries  of  these  results  can  be  obtained  from  Witt. 


relatively  low,  insurers  would  have  incurred  relatively  higher  loss  ratios  in 


order  to  improve  their  cash  flow  for  investment  purposes.  A  test  of  such  a 
hypothesis  over  a  longer  time  period  would  certainly  be  an  i.  teresting  topic 
for  future  research. 

Summary 

Witt  [1,2]  expected  to  find  an  inverse  relationship  between  mean  state  loss 
ratios  and  the  standard  deviations  of  the  associated  annual  loss  ratios  by  state 
for  automobile  insurance  if  underwriting  risk  were  properly  recognized  in  develop¬ 
ing  rates.  However,  he  found  a  positive  rather  than  an  inverse  relationship 
between  these  variables.  Hedges  argued  that  this  observed  positive  relationship 
could  have  resulted  from  a  mathematical  relationship  between  the  average  loss 
ratios  and  standard  deviations  and/or  from  certain  institutional  factors  which 
affect  loss  ratios  and  insurance  rates  differently  in  the  various  states. 

Venezian  tried  to  show  that  there  was  some  logical  support  for  Hedge's  mathematical 
hypothesis.  He  demonstrated  that  the  Central  Limit  Theorem  itself  was  not  strong 
enough  to  imply  that  the  sample  mean  and  standard  deviation  were  asymptotically 
uncorrelated.  However,  we  showed  that  a  double  Central  Limit  Theorem  was  in 
effect  in  Witt's  earlier  empirical  work,  and  this  effect  is  sufficient  to  imply 
that  the  correlation  between  the  sample  means  and  standard  deviations  should  be 
close  to  zero.  This  was  the  basic  thrust  of  a  heuristic  argument  by  Witt  in 
his  reply  to  Hedges.  A  precise  mathematical  justification  for  this  argument 
was  developed  in  this  paper.  Moreover,  it  was  shown  that  under  certain  condi¬ 
tions  treating  premiums  as  a  random  variable  did  not  qualitatively  change  the 
results  as  Venezian  thought  it  might.  As  Venezian  noted,  perhaps  other  measures 
of  underwriting  risk  might  be  used  to  better  explain  such  results.  Several 
such  measures  were  presented  here.  Perhaps,  this  discussion  will  stimulate 
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some  additional  research  on  the  relationship  between  underwriting  risk  and  return. 
The  inclusion  of  investment  return  in  a  more  comprehensive  testable  hypothesis 
might  help  to  resolve  the  underwriting  risk  and  return  paradox. 
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Density  estimates  of  surface  and  bottom  reverberation 
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Selected  surface  reverberation  and  bottom  reverberation 
returns  were  used  to  compute  estimates  of  the  probability 
density  function  of  the  instantaneous  reverberation.  To 
estimate  the  densities,  6500  samples  of  surface  reverberation 
and  3078  samples  of  bottom  reverberation  were  used.  The 
collections  of  samples  were  tested  for  randomness, 
independence,  homogeneity,  and  normality.  Both  the  surface 
and  bottom  reverberation  were  found  to  be  non-Gaussian. 
Kernel  techniques  were  applied  to  the  collections  of  samples 
to  provide  univariate  estimates  of  the  densities.  The 
densities  were  seen  to  be  nearly  Gaussian,  but  with  heavier 
tails. 


PACS  numbers: 


I.  INTRODUCTION 

It  is  commonly  assumed  in  signal  processing  for  underwater  acoustic 

applications  that  the  noise  field  is  a  Gaussian  random  process.  The 

Gaussian  assumption  allows  one  to  appeal  to  the  large  volume  of  signal 

processing  theory  devoted  to  Gaussian  noise  fields  and  often  allows 

tractable  analytical  solutions  of  the  signal  processing  algorithms.  In 

many  instances  the  assumption  of  a  Gaussian  noise  field  is  reasonable. 

However,  it  has  been  shown  that  non-Gaussian  noise  fields  can  be 
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encountered  in  situations  of  interest  in  underwater  acoustics.  It  is 
also  known  that  significant  performance  degradations  can  occur  when  a 
processor  optimized  for  the  Gaussian  noise  field  is  operating  in  a  non- 
Gaussian  noise  field. ^  To  optimize  these  processors  for  a  non-Gaussian 
noise  field,  it  is  usually  not  enough  to  know  that  the  noise  field  is 
non-Gaussian;  an  estimate  of  the  density  function  of  the  noise  field  is 
generally  required,  at  least  for  the  parametric  processing  techniques. 
For  example,  detection  algorithms  that  employ  a  likelihood  ratio  require 
some  estimate  of  the  signal  and  noise  densities. 

Previous  work  involving  a  statistical  analysis  of  underwater 
acoustic  fields  has  generally  been  limited  to  testing  for  normality  and 
has  not  provided  an  estimate  of  the  density. y^e  preseny  WOrk 
extends  this  type  of  analysis  to  include  univariate  density  estimates  of 
selected  non-Gaussian  surface  and  bottom  reverberation  returns.  These 
returns  were  formed  into  ensembles  and  each  ensemble  was  determined  to 
be  independent  and  identically  distributed.  The  ensembles  were 
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determined  to  be  non-Gaussian  by  the  application  of  several  tests  for 
univariate  normality. 

II.  EXPERIMENTAL  DATA 

The  surface  reverberation  data  were  collected  at  Lake  Travis  Test 
Station  (LTTS)  of  Applied  Research  Laboratories,  The  University  of  Texas 
at  Austin  ( ARL : UT ) . 7  The  data  consist  of  500  returns  of  a  pulsed  cw 
signal  scattered  from  the  wind-roughened  lake  surface  at  a  10.5°  grazing 
angle.  The  transmitted  frequency  was  80  kHz,  the  pulse  length  was  100 
nsec,  and  the  horizontal  transmit  beamwidth  was  approximately  15°. 
Winds  were  from  the  north  at  approximately  35-50  km/h  and  had  a  fetch  of 
2.4  km.  Wave  heights  were  estimated  visually  to  be  about  0.3-0. 6  m 
(trough  to  crest). 

The  bottom  reverberation  data  were  collected  from  a  bottom  that  was 
very  uniform  and  consisted  of  loose  silty  sand  mixed  with  shell.  The 
sonar  had  a  horizontal  transmit  beamwidth  of  approximately  3°  and 
operated  at  a  frequency  of  18  kHz  and  a  pulse  length  of  100  usee.  These 
data  were  collected  at  a  grazing  angle  of  35°.  Fifty-seven  returns  were 
used  to  form  the  bottom  reverberation  ensembles.  The  side  scan  sonar 
was  moving  in  a  straight  line  during  data  collection,  resulting  in 
returns  from  different  parts  of  the  bottom. 

All  reverberation  data  were  sampled  digitally  in  time  by  a  12-bit 
analog-to-digital  (A/D)  converter  representing  a  range  of  f 10  V.  The 
samples  from  each  return  occurring  at  the  same  time  after  transmission 
were  associated  together  into  ensembles.  Thirteen  ensembles  of  surface 


3 


reverberation  were  formed.  Each  ensemble  was  separated  in  time  by 
250  Msec.  Fifty-four  ensembles  of  bottom  reverberation  were  formed, 
separated  in  time  by  110  Msec. 

III.  ENSEMBLE  VALIDATION  AND  TESTING 

Each  of  the  ensembles  was  tested  for  randomness,  independence, 
homogeneity,  and  normality.  The  runs  up  and  down  test  for  randomness, 
the  Kolmogorov-Smirnov  two-sample  test  for  homogeneity,  the  Wilcoxon 
rank  sum  test  for  homogeneity,  Pearson's  test  of  skewness  for  normality, 
and  D'Agostino's  test  for  normality  were  applied  to  the  surface 
reverberation  data.  In  addition,  the  Kendall  r->nk  correlation  tests  for 
randomness  and  independence  were  applied  to  the  bottom  reverberation 
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data.  Descriptions  of  these  tests  can  be  found  elsewhere.  ’ 

The  results  of  these  tests  are  shown  in  Fig.  1  for  the  bottom 
reverberation  and  in  Fig.  2  for  the  surface  reverberation.  The  results 
of  each  test  are  displayed  as  the  probability  of  a  Type  I  error  for  each 
ensemble.  The  probability  of  a  Type  I  error  is  the  probability  of 
making  a  mistake  by  announcing  the  alternative  hypothesis  when  the  null 
hypothesis  is  true  (e.g.,  announcing  a  non-Gaussian  ensemble  when  it  is 
actually  Gaussian).  A  small  value  of  the  probability  of  a  Type  I  error 
indicates  the  null  hypothesis  of  the  test  can  be  rejected  with  some 
confidence. 

The  probability  of  a  Type  I  error  is  itself  a  random  variable  and 
is  uniformly  distributed  on  the  interval  [0,l]  if  the  null  hypothesis  is 
true,  i.e.,  if  (>  denotes  the  probability  of  a  Type  I  error  then 


Prob[u  :;P;Hq]  =  p  ,  O  P  1 

When  many  independent  ensembles  of  the  same  random  process  are  tested, 
their  outcomes  can  be  combined  to  give  an  overall  probability  that  the 
particular  hypothesis  being  tested  is  valid  for  the  entire  set  of 
ensembles.  If  this  probability  is  low  (say,  less  than  0.05),  then  the 
null  hypothesis  of  the  test  can  be  rejected  with  some  confidence  for  the 
entire  set  of  ensembles.  This  combined  probability  is  indicated  in  the 
figures  as  Pc. 

The  method  for  combining  the  nrobabi lities  is  Edgington's  normal 
1?  IT 

curve  method.  *  The  mean  of  the  probabilities  is  computed  and,  if 

the  probabilities  are  uniformly  distributed  on  the  interval  [o,l]  ,  then 
according  to  sampling  theory  the  sample  mean  will  be  normally 
distributed  with  a  mean  of  0.5  and  a  standard  deviation  of  l/12n,  where 
n  is  the  number  of  probabilities  combined.  Norma1  curve  tables  can  then 
be  referenced  to  determine  the  probability  that  the  sample  mean 
represents  the  mean  of  uniform  [0,l]  random  variables.  For  some 
tests,  for  example  the  runs  test,  the  exact  probability  of  a  Type  I 
error  cannot  always  be  computed,  but  only  the  upper  and  lower  bounds  on 
the  probability.  Thus  only  an  upper  and  lower  bound  on  Pc  can  be 
computed.  For  these  tests,  the  bounds  on  the  probability  of  a  Type  I 
error  are  indicated  by  two  lines  on  the  plots  in  Figs.  .1  and  2,  and  the 
corresponding  upper  and  lower  limits  of  Pc  are  given.  In  some  cases  the 
upper  and  lower  limits  are  the  same. 

As  can  be  seen,  the  ensembles  of  both  surface  and  bottom 
reverberation  can  be  accepted  as  consisting  of  independent,  identically 
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distributed  random  variables.  The  values  of  for  the  tests  for 
randomness,  independence,  and  homogeneity  are  greater  than  such  a  low 
value  as  0.05,  indicating  that  the  null  hypotheses  of  these  tests  cannot 
be  rejected  with  any  degree  of  confidence. 

However,  the  results  of  Pearson's  test  of  skewness  and  D'Agostino's 
test  for  normality  indicate  that  the  hypothesis  of  normality  can  be 
rejected  for  both  the  surface  and  bottom  reverberation.  Although  the 
value  of  PQ  for  Pearson's  test  of  skewness  for  the  surface  reverberat ion 
(Fig.  2)  does  not  indicate  that  the  Gaussian  assumption  can  be  rejected, 
the  results  of  D’Agostino's  test  give  a  confident  rejection  of 
normality.  Since  D'Agostino's  test  is  more  sensitive  to  the  kurtosis  of 
the  data,  these  results  indicate  a  rejection  of  normality  primarily  due 
to  a  significant  kurtosis.  Although  not  shown  here,  Pearson's  test  of 
kurtosis  was  also  performed  on  the  surface  reverberation  ensembles  and 
the  results  also  indicated  a  confident  rejection  of  normality.  The 
normality  of  the  bottom  reverberation  data,  Fig.  1,  can  be  marginally 
rejected  by  Pearson's  test  of  skewness  and  confidently  rejected  by 
D'Agostino's  test.  As  shown  later,  both  the  surface  and  bottom 
reverberation  had  larger  than  normal  kurtosis,  indicating  heavier  tails 
than  a  Gaussian  density. 

Although  the  ensembles  as  a  whole  tested  to  be  non-Gaussian,  it  is 
not  clear  that  all  the  ensembles  of  each  type  of  reverberation  were 
produced  from  the  same  underlying  non-Gaussian  distribution.  For  this 
reason,  a  k  sample  homogeneity  test^  was  applied  to  the  13  surface 
reverberation  ensembles  and  the  54  bottom  reverberation  ensembles.  The 
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results  of  this  test  indicated  that  the  hypothesis  of  homogeneity  could 
not  be  rejected  for  the  13  ensembles  of  surface  reverberation,  nor  for 
the  54  ensembles  of  bottom  reverberation.  The  probability  of  Type  I 
error  was  0.9536  and  0.8054,  respectively.  This  result  also  is  a 
justification  for  combining  ensembles  together  during  density 
estimation,  which  will  be  discussed  in  the  next  section. 

IV.  DENSITY  ESTIMATION 
A.  Technique  of  density  estimation 

The  nonparametr ic  estimation  of  the  probability  density  for  a  data 
set  has  been  dealt  with  by  many  authors,  notably  Rosenblatt^  and 
Parzen.^  The  two  general  methods  used  in  density  estimation  are  series 
approximation  using  orthogonal  functions  and  series  of  kernel  functions. 
The  latter  method  is  used  here.  If  X1,X2,...,Xn  are  independent, 
identically  distributed  random  variables,  then  the  kernel  density 
estimate  is 

fn(x)  =  n  £,  hW  K(~hW) 

where  K(x)  satisfies  certain  regularity  conditions  (cf.  Parzen).  The 
function  h(n)  defines  a  sequence  of  kernel  widths. 

In  using  estimates  of  this  form,  a  choice  must  be  made  for  K(x)  and 
h(n).  Many  choices  for  the  kernel  function  were  considered,  including 
rectangular,  quadratic,  and  exponential  forms.  The  kernel  function  used 
is  due  to  Silverman:^ 


if  |x|<l 

if  1  <  |  x  |<2 
otherwi se 

This  quartic  was  selected  for  its  vanishing  support  outside  the  interval 

[-2,2]. 

There  are  two  classes  of  choices  for  h(n):  constant  widths  or 

variable  widths.  Kernel  estimates  have  been  shown  to  be  consistent  for 

both  classes  of  h(n),  implying  that  either  choice  will  suffice  for  very 

large  sample  sets.  If  the  kernel  width  is  constant,  then  asymptotic 

-1/5 

results  prove  that  the  optimal  kernel  width  is  proportional  to  n  . 
Kernel  estimates  using  variable  widths  have  been  investigated  by  Penrod 
and  Machell  et  al.,  as  well  as  others.18,19  In  this  study,  both 
variable  and  constant  width  kernel  density  estimates  were  attempted. 
For  reasons  given  below,  estimates  based  on  constant  width  kernels  ere 
presented.  The  selection  of  the  constant  was  initially  based  on  the 
optimal  h(n)  defined  by  Rosenblatt,  although  ignorance  of  the  true 
underlying  density  necessitated  empirical  determination  of  an  acceptable 
constant. 

B.  Results  of  density  estimates 

Both  the  fixed  and  variable  width  kernel  density  estimation 
algorithms  were  run  on  the  sample  data.  The  results  indicated  that  the 
fixed  width  estimates  provided  less  smooth  densities,  while  the  variable 
width  density  estimates  yielded  very  smooth  densities.  The  comparison 


1/4  x4  -  1/2  | x | 3  +  1/2 

K(x)  *  1/4  | x | ( 2- | x | ) 3 
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of  the  moments  of  the  density  estimates  with  the  sample  moments  of  the 

data  demonstrated  large  disparities  for  the  variable  width  estimates 

based  on  the  k  nearest  neighbor  technique,^  particularly  in  the 

measurements  of  variance  and  kurtosis.  The  fixed  width  estimates 

provided  much  closer  agreement  for  all  of  the  moment  statistics.  For 

this  reason  the  variable  width  estimates  were  deemed  unreasonable,  and 

only  densities  from  fixed  width  estimates  are  given.  The  criteria  for 

selecting  the  kernel  width  were  based  on  the  subjective  evaluation  of  the 

"smoothness"  of  the  density  estimate  along  with  "reasonable"  agreement 

of  the  moments  of  the  density  with  the  sample  moments  of  the  data.  For 

the  surface  reverberation  data,  a  kernel  width  of  112.2  was  used  while  a 

width  of  70.0  was  used  for  the  bottom  reverberation  data.  Guidelines 

for  the  selection  of  kernel  widths  for  density  estimates  may  be  found 
18 

elsewhere. 

In  Fig.  3  the  individual  density  estimates  for  the  surface 
reverberation  data  are  presented  in  a  three-dimensional  plot  format. 
Each  estimate  was  computed  on  a  mesh  of  101  points  equally  spaced  in  an 
interval  containing  the  minimum  and  maximum  value  of  the  entire  data 
set.  Although  some  variation  exists  in  the  individual  estimates,  the  k 
sample  homogeneity  test  gives  confidence  that  the  estimates  are 
representative  of  the  same  density.  Thus  it  is  natural  to  calculate  the 
mean  density  estimate 

f(x)  '  Fij?,  fn,j(x)  > 
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where  M  is  the  total  number  of  independent  density  estimates  and  fn  j(x) 
is  the  density  estimate  for  the  jth  ensemble.  f(x)  is  presented  as  a 
reasonable  estimate  of  the  true  probability  density  of  the  underlying 
acoustic  process.  Note  that  f  itself  is  a  kernel  estimate  and  hence  is 
a  density  function.  Figure  4  is  a  plot  of  the  mean  density  estimate  for 
the  surface  reverberation. 

The  average  density  estimate  is  compared  to  a  Gaussian  density  of 
the  same  mean  and  variance  as  the  sample  data  in  Fig.  4  to  allow  a 
visual  comparison  of  this  density  estimate  to  a  known  (i.e.,  Gaussian) 
density.  To  facilitate  this  comparison,  a  log  transformation  of  the 
density  is  performed;  these  results  are  also  presented  in  Fig.  4.  This 
transformation  tends  to  highlight  the  difference  in  the  tails  of  the 
density.  As  can  be  seen,  the  density  estimate  appears  to  be  nearly 
Gaussian,  but  the  log  plot  indicates  a  significant  deviation  in  the 
tails.  The  density  estimate  has  heavier  tails  than  the  Gaussian. 

To  ascertain  how  well  the  density  estimate  represents  the  data  set, 
integral  estimates  of  the  mean,  variance,  skew,  and  kurtosis  based  on 
the  density  estimate  were  computed  and  compared  to  the  sample  estimates 
of  the  mean,  variance,  skew,  and  kurtosis  computed  from  the  entire  data 
set.  The  results  are  shown  in  Table  I.  Approximate  confidence 

intervals  are  given  in  Table  II  to  provide  an  estimate  of  error  bounds 
for  the  sample  mean  and  variance.  These  confidence  intervals  are 
computed  in  two  ways.  The  first  assumes  the  data  to  be  normal  (which 
they  are  not)  and  the  intervals  are 


10 


for  the  mean  and 


on 

for  the  variance.  Here  Z(p)  and  vn(p)  are  the  pth  quantiles  of  the 
standard  normal  distribution  and  the  chi  square  distribution  with  n 
degrees  of  freedom,  respectively.  The  value  of  <*  was  chosen  to  be  0.05, 
resulting  in  95%  confidence  intervals.  The  second  confidence  intervals 
are  calculated  simply  as  T  +  2  \J var(T),  where  T  is  the  mean  or  variance 
statistic,  and  where  var(T)  is  the  sampling  variance  of  T,  given 

21  —  ^  o  Ao  ^  *4 

approximately  by  var(X)%  t  /n  and  var(^  )  «  (-y^-l)  fT  /n*  Here  n  is 
the  total  number  of  samples  used  to  estimate  these  moments.  The  means 
computed  for  the  kernel  densities  easily  reside  in  their  respective 
confidence  intervals.  The  variances  for  the  kernel  densities  do  not  lie 
in  the  computed  confidence  intervals.  It  was  observed  that  the 
variances  could  be  lowered  and  brought  into  the  confidence  intervals  by 
decreasing  the  kernel  width,  but  only  at  the  expense  of  the  smoothness 
of  the  density  estimate.  Likewise  the  skew  and  kurtosi^  could  be 
brought  into  better  agreement  with  the  sample  estimates  by  decreasing 
the  kernel  width.  The  kernel  widths  that  were  chosen  provided  the  best 
agreement  of  the  moments  of  the  densities  with  the  sample  estimates 
while  still  maintaining  a  reasonably  smooth  density  estimate.  Variable 
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width  kernel  techniques  were  also  tried,  but  provided  even  less 
agreement  with  the  sample  estimates  of  the  moments. 

The  individual  density  estimates  of  the  bottom  reverberation  are 
shown  in  Fig.  5.  The  54  ensembles  of  bottom  reverberation  have  been 
grouped  into  9  ensembles  of  342  samples  each  to  provide  more  samples  for 
the  individual  estimates  and  thus  improve  their  accuracy.  The  mean 
density  estimate  for  the  bottom  reverberation  is  compared  to  a  Gaussian 
density  in  Fig.  6.  This  density  estimate  also  has  heavier  tails  than 
the  Gaussian.  It  can  be  observed  that,  due  to  a  lack  of  sample  values 
in  a  limited  range  within  the  tails,  the  fixed  width  kernel  estimates  a 
very  low  value  for  the  probability  density  in  this  range.  This  is 
particularly  obvious  on  the  log  plot.  Note  also  that  there  appear  to  be 
a  few  outliers  on  the  extremes  of  the  tail.  It  cannot  be  determined 
whether  this  apparent  outlier  group  is  an  anomaly  of  the  sampled 
process,  or  is  in  fact  a  feature  of  the  density.  It  is  felt  that  a 
larger  data  set  would  provide  a  more  accurate  estimate  of  the  density  in 
the  tails.  Table  I  is  a  comparison  of  the  integral  estimates  and  sample 
estimates  of  the  moments  of  the  bottom  reverberation.  The  bottom 
reverberation  has  a  larger  kurtosis  than  the  surface  reverberation, 
resulting  in  heavier  tails  and  a  more  significant  departure  from 
normality. 

V.  SUMMARY 

Thirteen  surface  reverberation  ensembles  of  size  500  and  54  bottom 
reverberation  ensembles  of  size  57  were  formed  and  tested  for 
randomness. 
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independence,  homogeneity,  and  normality.  Both  the  surface 
reverberation  and  the  bottom  reverberation  were  found  to  be  non- 
Gaussian.  A  k  sample  homogeneity  test  was  applied  to  the  data  to  verify 
that  the  ensembles  represented  the  same  underlying  distributions. 
Univariate  density  estimates  of  the  surface  reverberation  and  the  bottom 
reverberation  were  then  computed  using  fixed  width  kernel  techniques  and 
compared  to  a  Gaussian  density.  The  bottom  reverberation  density 
appeared  to  differ  more  from  a  Gaussian  density  than  the  surface 
reverberation  density  due  to  heavier  tails. 
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TABLE  I.  Comparison  of  the  sample  statistics  mean,  variance, 
skew,  and  kurtosis  with  the  statistics  calculated  from  the 


TABLE  II.  Approximate  95..  confidence  interval  for 
population  statistics. 
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Surface  1  (-18.93  ,  -3.43)  (102412  ,  109703) 

Reverberation 

2  (-19.26  ,  -3.10)  (101882  ,  110044) 


Bottom  1  (-6.01  ,  5.59)  (26762  ,  29574) 

Reverberation 

2  (-6.25  ,  5.83)  (26243  ,  29987) 
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